🤖 AI Summary
Current clinical risk scores for clear cell renal cell carcinoma (ccRCC), such as the Leibovich score, lack integration of preoperative imaging and postoperative histopathology, limiting personalized recurrence risk prediction. Method: We propose a modular multimodal deep learning framework that jointly models preoperative contrast-enhanced CT scans and postoperative whole-slide images (WSIs), leveraging pretrained encoders—including ResNet-18 for CT and TITAN-CONCH for WSIs—and systematically comparing late versus intermediate fusion strategies. Results: The WSI-only model already significantly outperforms the CT-only model; intermediate fusion further improves performance, achieving a C-index of 0.73—comparable to the recalibrated Leibovich score (0.75)—with excellent calibration. This work presents the first systematic validation of cross-scale CT–WSI intermediate fusion for ccRCC survival prediction, establishing a novel paradigm for leveraging multimodal foundation models in precision oncologic prognosis.
📝 Abstract
Recurrence risk estimation in clear cell renal cell carcinoma (ccRCC) is essential for guiding postoperative surveillance and treatment. The Leibovich score remains widely used for stratifying distant recurrence risk but offers limited patient-level resolution and excludes imaging information. This study evaluates multimodal recurrence prediction by integrating preoperative computed tomography (CT) and postoperative histopathology whole-slide images (WSIs). A modular deep learning framework with pretrained encoders and Cox-based survival modeling was tested across unimodal, late fusion, and intermediate fusion setups. In a real-world ccRCC cohort, WSI-based models consistently outperformed CT-only models, underscoring the prognostic strength of pathology. Intermediate fusion further improved performance, with the best model (TITAN-CONCH with ResNet-18) approaching the adjusted Leibovich score. Random tie-breaking narrowed the gap between the clinical baseline and learned models, suggesting discretization may overstate individualized performance. Using simple embedding concatenation, radiology added value primarily through fusion. These findings demonstrate the feasibility of foundation model-based multimodal integration for personalized ccRCC risk prediction. Future work should explore more expressive fusion strategies, larger multimodal datasets, and general-purpose CT encoders to better match pathology modeling capacity.