Disentangled and Interpretable Multimodal Attention Fusion for Cancer Survival Prediction

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the entanglement of modality-shared and modality-specific representations—and the resulting lack of interpretability—in multimodal fusion of whole-slide images (WSIs) and transcriptomic data for cancer survival prediction, this paper proposes a disentangled multimodal learning framework. We design an intra- and inter-modal attention fusion mechanism to explicitly model both cross-modal shared and modality-specific interactions. A distance correlation-based disentanglement loss is introduced to enforce orthogonality in the learned representation space. Additionally, Shapley values are employed to quantify the contribution of each modality, as well as individual genes and histopathological regions. Evaluated on four public datasets, our method achieves an average 1.85% improvement in C-index and enhances representation disentanglement by 23.7%. The framework significantly improves predictive performance, biological interpretability, and support for downstream biological discovery.

Technology Category

Application Category

📝 Abstract
To improve the prediction of cancer survival using whole-slide images and transcriptomics data, it is crucial to capture both modality-shared and modality-specific information. However, multimodal frameworks often entangle these representations, limiting interpretability and potentially suppressing discriminative features. To address this, we propose Disentangled and Interpretable Multimodal Attention Fusion (DIMAF), a multimodal framework that separates the intra- and inter-modal interactions within an attention-based fusion mechanism to learn distinct modality-specific and modality-shared representations. We introduce a loss based on Distance Correlation to promote disentanglement between these representations and integrate Shapley additive explanations to assess their relative contributions to survival prediction. We evaluate DIMAF on four public cancer survival datasets, achieving a relative average improvement of 1.85% in performance and 23.7% in disentanglement compared to current state-of-the-art multimodal models. Beyond improved performance, our interpretable framework enables a deeper exploration of the underlying interactions between and within modalities in cancer biology.
Problem

Research questions and friction points this paper is trying to address.

Improve cancer survival prediction using multimodal data.
Disentangle modality-specific and shared representations for better interpretability.
Enhance model performance and understanding of cancer biology interactions.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangles intra- and inter-modal interactions
Uses Distance Correlation for representation disentanglement
Integrates Shapley explanations for interpretability
🔎 Similar Papers
No similar papers found.
A
Aniek Eijpe
AI Technology for Life, Department of Information and Computing Sciences, Department of Biology, Utrecht University, Utrecht, The Netherlands
S
Soufyan Lakbir
AI Technology for Life, Department of Information and Computing Sciences, Department of Biology, Utrecht University, Utrecht, The Netherlands
M
Melis Erdal Cesur
Computational Pathology group, Department of Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
Sara P. Oliveira
Sara P. Oliveira
Postdoctoral researcher | Computational Pathology group | The Netherlands Cancer Institute
Computational PathologyMedical Image AnalysisDeep LearningComputer Vision
Sanne Abeln
Sanne Abeln
Professor of AI Technology for Life, Utrecht University
AI for the Life SciencesProtein BioinformaticsGenomic AlterationsNeurodegenerative Disease.
Wilson Silva
Wilson Silva
Assistant Professor, AI Technology for Life, Utrecht University
Machine LearningComputer VisionExplainable AIMedical Image AnalysisPrivacy