🤖 AI Summary
Current cancer survival prediction models face two key limitations in multimodal fusion: (1) fixed fusion strategies (e.g., concatenation, attention) hinder feature disentanglement and dynamic cross-modal interaction; (2) expert mixture (MoE)-based approaches isolate modality-specific experts, impeding inter-modal information exchange. To address these, we propose the Decoupling-Reorganization-Fusion (DRF) framework. First, a modality decoupling module separates heterogeneous features into disentangled representations. Second, a stochastic feature reorganization mechanism breaks predefined fusion pathways, enhancing feature diversity and inter-expert information flow. Third, a region-wise cross-attention mechanism coupled with a dynamic MoE fusion module improves both disentanglement quality and fusion adaptability. Evaluated on a proprietary hepatocellular carcinoma dataset and three TCGA cohorts, DRF significantly improves concordance index (C-index) and calibration metrics, demonstrating strong generalizability and clinical applicability.
📝 Abstract
Cancer survival analysis commonly integrates information across diverse medical modalities to make survival-time predictions. Existing methods primarily focus on extracting different decoupled features of modalities and performing fusion operations such as concatenation, attention, and MoE-based (Mixture-of-Experts) fusion. However, these methods still face two key challenges: i) Fixed fusion schemes (concatenation and attention) can lead to model over-reliance on predefined feature combinations, limiting the dynamic fusion of decoupled features; ii) in MoE-based fusion methods, each expert network handles separate decoupled features, which limits information interaction among the decoupled features. To address these challenges, we propose a novel Decoupling-Reorganization-Fusion framework (DeReF), which devises a random feature reorganization strategy between modalities decoupling and dynamic MoE fusion modules.Its advantages are: i) it increases the diversity of feature combinations and granularity, enhancing the generalization ability of the subsequent expert networks; ii) it overcomes the problem of information closure and helps expert networks better capture information among decoupled features. Additionally, we incorporate a regional cross-attention network within the modality decoupling module to improve the representation quality of decoupled features. Extensive experimental results on our in-house Liver Cancer (LC) and three widely used TCGA public datasets confirm the effectiveness of our proposed method. The code will be made publicly available.