A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer

📅 2025-08-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high diagnostic uncertainty and consequent overtreatment in non-invasive evaluation of renal masses, this study develops the first end-to-end intelligent renal tumor analysis system encompassing imaging analysis, diagnostic decision-making, and prognostic prediction. We propose a disease-centric two-stage pretraining paradigm that incorporates domain knowledge to enhance both visual and textual encoders, and employ contrastive learning for cross-modal alignment. Built upon a vision-language foundation model architecture, the system integrates domain-adaptive pretraining and zero-shot transfer capabilities. Evaluated across ten clinical tasks, it consistently outperforms state-of-the-art methods: achieving a C-index of 0.726 for recurrence-free survival prediction on the TCIA cohort, and attaining full-label baseline performance using only 20% labeled data—thereby significantly improving diagnostic accuracy and enabling more precise, individualized patient management.

Technology Category

Application Category

📝 Abstract
The non-invasive assessment of increasingly incidentally discovered renal masses is a critical challenge in urologic oncology, where diagnostic uncertainty frequently leads to the overtreatment of benign or indolent tumors. In this study, we developed and validated RenalCLIP using a dataset of 27,866 CT scans from 8,809 patients across nine Chinese medical centers and the public TCIA cohort, a visual-language foundation model for characterization, diagnosis and prognosis of renal mass. The model was developed via a two-stage pre-training strategy that first enhances the image and text encoders with domain-specific knowledge before aligning them through a contrastive learning objective, to create robust representations for superior generalization and diagnostic precision. RenalCLIP achieved better performance and superior generalizability across 10 core tasks spanning the full clinical workflow of kidney cancer, including anatomical assessment, diagnostic classification, and survival prediction, compared with other state-of-the-art general-purpose CT foundation models. Especially, for complicated task like recurrence-free survival prediction in the TCIA cohort, RenalCLIP achieved a C-index of 0.726, representing a substantial improvement of approximately 20% over the leading baselines. Furthermore, RenalCLIP's pre-training imparted remarkable data efficiency; in the diagnostic classification task, it only needs 20% training data to achieve the peak performance of all baseline models even after they were fully fine-tuned on 100% of the data. Additionally, it achieved superior performance in report generation, image-text retrieval and zero-shot diagnosis tasks. Our findings establish that RenalCLIP provides a robust tool with the potential to enhance diagnostic accuracy, refine prognostic stratification, and personalize the management of patients with kidney cancer.
Problem

Research questions and friction points this paper is trying to address.

Non-invasive assessment of renal masses to reduce diagnostic uncertainty
Developing a vision-language model for kidney cancer diagnosis and prognosis
Improving generalization and data efficiency in renal mass characterization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-language foundation model for kidney cancer
Two-stage pre-training with contrastive learning
Superior generalization across 10 clinical tasks
🔎 Similar Papers
Y
Yuhui Tao
Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, Shanghai, 200032, China.; Shanghai Key Laboratory of Medical Imaging Computing and Computer Assisted Intervention, Shanghai, 200032, China.
Z
Zhongwei Zhao
Department of Urology, Qilu Hospital of Shandong University, Jinan, Shandong, 250012, China.
Z
Zilong Wang
Microsoft Research Asia, Shanghai, 200232, China.
X
Xufang Luo
Microsoft Research Asia, Shanghai, 200232, China.
F
Feng Chen
Department of Radiology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
K
Kang Wang
Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, Shanghai, 200032, China.; Shanghai Key Laboratory of Medical Imaging Computing and Computer Assisted Intervention, Shanghai, 200032, China.
C
Chuanfu Wu
Center of Health data science, Linyi People’s Hospital, Shandong, 276003, China.; Shandong Open Laboratory of Data Innovation Application, Shandong, 276003, China.
X
Xue Zhang
Department of Radiology, the First People’s Hospital of Lianyungang, Lianyungang, 222002, China.
Shaoting Zhang
Shaoting Zhang
Shanghai AI Lab; SenseTime Research
Medical Image AnalysisComputer VisionFoundation Models
J
Jiaxi Yao
Department of Urology, Zhangye People’s Hospital affiliated to Hexi University, Zhangye, 734000, China.
X
Xingwei Jin
Department of Urology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
Xinyang Jiang
Xinyang Jiang
Microsoft Research Asia
Computer VisionReIDDeep Learning
Y
Yifan Yang
Microsoft Research Asia, Shanghai, 200232, China.
D
Dongsheng Li
Microsoft Research Asia, Shanghai, 200232, China.
Lili Qiu
Lili Qiu
NAI Fellow, ACM Fellow, IEEE Fellow, Professor, Dept. of Computer Science, The University of Texas
Wireless NetworksWireless SensingMobile ComputingSystems5G
Z
Zhiqiang Shao
Department of Urology, Linyi People’s Hospital, Shandong, 276003, China.
J
Jianming Guo
Department of Urology, Zhongshan Hospital, Fudan University, Shanghai, 200032, China.
N
Nengwang Yu
Department of Urology, Qilu Hospital of Shandong University, Jinan, Shandong, 250012, China.
S
Shuo Wang
Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, Shanghai, 200032, China.; Shanghai Key Laboratory of Medical Imaging Computing and Computer Assisted Intervention, Shanghai, 200032, China.
Ying Xiong
Ying Xiong
Clausthal University of Technology
Petroleum geologySedimentologyGeochemistry