Bridging the gap between Performance and Interpretability: An Explainable Disentangled Multimodal Framework for Cancer Survival Prediction

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Multimodal cancer survival prediction often struggles to balance predictive performance with model interpretability. To address this challenge, this work proposes the DIMAFx framework, which disentangles whole-slide histopathology images and transcriptomic data to learn both modality-specific and shared representations. By doing so, DIMAFx achieves state-of-the-art predictive accuracy while enhancing model transparency. Notably, it is the first approach to unify high performance with high interpretability in multimodal survival analysis. Integrated with SHAP-based feature attribution, the framework reveals biologically meaningful interactions between modalities. Validation across multiple cancer cohorts demonstrates that DIMAFx successfully identifies shared risk features in breast cancer associated with high-grade morphology and estrogen response pathways, as well as modality-specific signals linked to the tumor microenvironment.

Technology Category

Application Category

📝 Abstract

While multimodal survival prediction models are increasingly more accurate, their complexity often reduces interpretability, limiting insight into how different data sources influence predictions. To address this, we introduce DIMAFx, an explainable multimodal framework for cancer survival prediction that produces disentangled, interpretable modality-specific and modality-shared representations from histopathology whole-slide images and transcriptomics data. Across multiple cancer cohorts, DIMAFx achieves state-of-the-art performance and improved representation disentanglement. Leveraging its interpretable design and SHapley Additive exPlanations, DIMAFx systematically reveals key multimodal interactions and the biological information encoded in the disentangled representations. In breast cancer survival prediction, the most predictive features contain modality-shared information, including one capturing solid tumor morphology contextualized primarily by late estrogen response, where higher-grade morphology aligned with pathway upregulation and increased risk, consistent with known breast cancer biology. Key modality-specific features capture microenvironmental signals from interacting adipose and stromal morphologies. These results show that multimodal models can overcome the traditional trade-off between performance and explainability, supporting their application in precision medicine.

Problem

Research questions and friction points this paper is trying to address.

multimodal

survival prediction

interpretability

cancer

disentangled representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

disentangled representation

explainable AI

multimodal learning