RobSurv: Vector Quantization-Based Multi-Modal Learning for Robust Cancer Survival Prediction

πŸ“… 2025-05-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current multimodal CT/PET-based cancer survival prediction models suffer from poor clinical generalizability due to sensitivity to imaging noise, inter-scanner protocol variations, and cross-modal feature inconsistency. To address this, we propose a dual-path vector quantization (VQ) architecture: a discrete codebook path enhances noise robustness, while a continuous path preserves fine-grained anatomical details; combined with block-level local fusion and Transformer-based global modeling, it achieves consistent multimodal feature representation. Our method integrates dual-stream encoding, patch-wise feature fusion, and an enhanced DeepSurv framework for survival analysis. Evaluated on HECKTOR, H&N1, and NSCLC datasets, our approach achieves C-indices of 0.771, 0.742, and 0.734, respectively. Under strong noise corruption, performance degradation is only 3.8–4.5%, significantly outperforming baselines (improvement of ~8–12 percentage points). This work delivers a highly robust and generalizable solution for multimodal medical imaging–driven survival prediction.

Technology Category

Application Category

πŸ“ Abstract
Cancer survival prediction using multi-modal medical imaging presents a critical challenge in oncology, mainly due to the vulnerability of deep learning models to noise and protocol variations across imaging centers. Current approaches struggle to extract consistent features from heterogeneous CT and PET images, limiting their clinical applicability. We address these challenges by introducing RobSurv, a robust deep-learning framework that leverages vector quantization for resilient multi-modal feature learning. The key innovation of our approach lies in its dual-path architecture: one path maps continuous imaging features to learned discrete codebooks for noise-resistant representation, while the parallel path preserves fine-grained details through continuous feature processing. This dual representation is integrated through a novel patch-wise fusion mechanism that maintains local spatial relationships while capturing global context via Transformer-based processing. In extensive evaluations across three diverse datasets (HECKTOR, H&N1, and NSCLC Radiogenomics), RobSurv demonstrates superior performance, achieving concordance index of 0.771, 0.742, and 0.734 respectively - significantly outperforming existing methods. Most notably, our model maintains robust performance even under severe noise conditions, with performance degradation of only 3.8-4.5% compared to 8-12% in baseline methods. These results, combined with strong generalization across different cancer types and imaging protocols, establish RobSurv as a promising solution for reliable clinical prognosis that can enhance treatment planning and patient care.
Problem

Research questions and friction points this paper is trying to address.

Robust cancer survival prediction using multi-modal medical imaging
Overcoming noise and protocol variations in deep learning models
Extracting consistent features from heterogeneous CT and PET images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vector quantization for noise-resistant feature learning
Dual-path architecture with discrete and continuous processing
Transformer-based patch-wise fusion for spatial context
πŸ”Ž Similar Papers
No similar papers found.
A
Aiman Farooq
Indian Institute of Technology Jodhpur
A
Azad Singh
Indian Institute of Technology Jodhpur
D
Deepak Mishra
Indian Institute of Technology Jodhpur
Santanu Chaudhury
Santanu Chaudhury
IIT Delhi
Computer VisionComputational IntelligenceMultimedia SystemsRobotics