🤖 AI Summary
Existing cancer survival prediction models struggle to adapt to dynamic clinical data streams, and multimodal continual learning remains unexplored in survival analysis. Unimodal continual learning approaches suffer from catastrophic forgetting and fail to capture evolving interactions between whole-slide images and genomic data. To address these challenges, we propose ConSurv—the first multimodal continual learning framework for survival analysis—integrating Multi-Stage Mixture of Experts (MS-MoE) with Feature-Constrained Replay (FCR). ConSurv jointly enforces feature-level constraints at both the encoder and fusion stages to enable stable cross-modal knowledge transfer. Evaluated on our newly constructed MSAIL benchmark for multimodal incremental learning, ConSurv consistently outperforms state-of-the-art methods across key survival metrics—including C-index and Integrated Brier Score—demonstrating superior long-term stability and generalization capability.
📝 Abstract
Survival prediction of cancers is crucial for clinical practice, as it informs mortality risks and influences treatment plans. However, a static model trained on a single dataset fails to adapt to the dynamically evolving clinical environment and continuous data streams, limiting its practical utility. While continual learning (CL) offers a solution to learn dynamically from new datasets, existing CL methods primarily focus on unimodal inputs and suffer from severe catastrophic forgetting in survival prediction. In real-world scenarios, multimodal inputs often provide comprehensive and complementary information, such as whole slide images and genomics; and neglecting inter-modal correlations negatively impacts the performance. To address the two challenges of catastrophic forgetting and complex inter-modal interactions between gigapixel whole slide images and genomics, we propose ConSurv, the first multimodal continual learning (MMCL) method for survival analysis. ConSurv incorporates two key components: Multi-staged Mixture of Experts (MS-MoE) and Feature Constrained Replay (FCR). MS-MoE captures both task-shared and task-specific knowledge at different learning stages of the network, including two modality encoders and the modality fusion component, learning inter-modal relationships. FCR further enhances learned knowledge and mitigates forgetting by restricting feature deviation of previous data at different levels, including encoder-level features of two modalities and the fusion-level representations. Additionally, we introduce a new benchmark integrating four datasets, Multimodal Survival Analysis Incremental Learning (MSAIL), for comprehensive evaluation in the CL setting. Extensive experiments demonstrate that ConSurv outperforms competing methods across multiple metrics.