π€ AI Summary
Multimodal cancer survival prediction often struggles to balance predictive performance with model interpretability. To address this challenge, this work proposes the DIMAFx framework, which disentangles whole-slide histopathology images and transcriptomic data to learn both modality-specific and shared representations. By doing so, DIMAFx achieves state-of-the-art predictive accuracy while enhancing model transparency. Notably, it is the first approach to unify high performance with high interpretability in multimodal survival analysis. Integrated with SHAP-based feature attribution, the framework reveals biologically meaningful interactions between modalities. Validation across multiple cancer cohorts demonstrates that DIMAFx successfully identifies shared risk features in breast cancer associated with high-grade morphology and estrogen response pathways, as well as modality-specific signals linked to the tumor microenvironment.
π Abstract
While multimodal survival prediction models are increasingly more accurate, their complexity often reduces interpretability, limiting insight into how different data sources influence predictions. To address this, we introduce DIMAFx, an explainable multimodal framework for cancer survival prediction that produces disentangled, interpretable modality-specific and modality-shared representations from histopathology whole-slide images and transcriptomics data. Across multiple cancer cohorts, DIMAFx achieves state-of-the-art performance and improved representation disentanglement. Leveraging its interpretable design and SHapley Additive exPlanations, DIMAFx systematically reveals key multimodal interactions and the biological information encoded in the disentangled representations. In breast cancer survival prediction, the most predictive features contain modality-shared information, including one capturing solid tumor morphology contextualized primarily by late estrogen response, where higher-grade morphology aligned with pathway upregulation and increased risk, consistent with known breast cancer biology. Key modality-specific features capture microenvironmental signals from interacting adipose and stromal morphologies. These results show that multimodal models can overcome the traditional trade-off between performance and explainability, supporting their application in precision medicine.