Measuring Cross-Modal Interactions in Multimodal Models

📅 2024-12-20
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing medical AI interpretability methods struggle to quantify cross-modal interactions in multimodal models—particularly when jointly leveraging heterogeneous data modalities (e.g., imaging, text, physiological signals)—and lack individualized, sample-level attribution capabilities. To address this, we propose InterSHAP: the first unsupervised, multimodal-compatible interaction attribution framework grounded in the Shapley interaction index. InterSHAP rigorously disentangles unimodal contributions from higher-order cross-modal synergies without requiring ground-truth labels or model performance assumptions. By extending SHAP theory and enabling open integration, we empirically validate InterSHAP on real-world clinical multimodal datasets. It delivers fine-grained, instance-level explanations; accurately identifies modality-specific and synergistic interaction patterns; and significantly enhances clinical trustworthiness and model debuggability.

Technology Category

Application Category

📝 Abstract
Integrating AI in healthcare can greatly improve patient care and system efficiency. However, the lack of explainability in AI systems (XAI) hinders their clinical adoption, especially in multimodal settings that use increasingly complex model architectures. Most existing XAI methods focus on unimodal models, which fail to capture cross-modal interactions crucial for understanding the combined impact of multiple data sources. Existing methods for quantifying cross-modal interactions are limited to two modalities, rely on labelled data, and depend on model performance. This is problematic in healthcare, where XAI must handle multiple data sources and provide individualised explanations. This paper introduces InterSHAP, a cross-modal interaction score that addresses the limitations of existing approaches. InterSHAP uses the Shapley interaction index to precisely separate and quantify the contributions of the individual modalities and their interactions without approximations. By integrating an open-source implementation with the SHAP package, we enhance reproducibility and ease of use. We show that InterSHAP accurately measures the presence of cross-modal interactions, can handle multiple modalities, and provides detailed explanations at a local level for individual samples. Furthermore, we apply InterSHAP to multimodal medical datasets and demonstrate its applicability for individualised explanations.
Problem

Research questions and friction points this paper is trying to address.

Artificial Intelligence
Medical Applications
Interpretable Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

InterSHAP
multi-modal explainability
AI healthcare integration
🔎 Similar Papers
No similar papers found.
L
Laura Wenderoth
Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
Konstantin Hemker
Konstantin Hemker
University of Cambridge
Multimodal MLRepresentation LearningExplainable AI for Medicine
Nikola Simidjievski
Nikola Simidjievski
Télécom Paris, Institut Polytechnique de Paris, France | University of Cambridge, UK
machine learningmultimodal learningml for healthcarebreast cancerequation discovery
M
M. Jamnik
Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom