Measuring Cross-Modal Interactions in Multimodal Models

📅 2024-12-20

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing medical AI interpretability methods struggle to quantify cross-modal interactions in multimodal models—particularly when jointly leveraging heterogeneous data modalities (e.g., imaging, text, physiological signals)—and lack individualized, sample-level attribution capabilities. To address this, we propose InterSHAP: the first unsupervised, multimodal-compatible interaction attribution framework grounded in the Shapley interaction index. InterSHAP rigorously disentangles unimodal contributions from higher-order cross-modal synergies without requiring ground-truth labels or model performance assumptions. By extending SHAP theory and enabling open integration, we empirically validate InterSHAP on real-world clinical multimodal datasets. It delivers fine-grained, instance-level explanations; accurately identifies modality-specific and synergistic interaction patterns; and significantly enhances clinical trustworthiness and model debuggability.

Technology Category

Application Category

📝 Abstract

Integrating AI in healthcare can greatly improve patient care and system efficiency. However, the lack of explainability in AI systems (XAI) hinders their clinical adoption, especially in multimodal settings that use increasingly complex model architectures. Most existing XAI methods focus on unimodal models, which fail to capture cross-modal interactions crucial for understanding the combined impact of multiple data sources. Existing methods for quantifying cross-modal interactions are limited to two modalities, rely on labelled data, and depend on model performance. This is problematic in healthcare, where XAI must handle multiple data sources and provide individualised explanations. This paper introduces InterSHAP, a cross-modal interaction score that addresses the limitations of existing approaches. InterSHAP uses the Shapley interaction index to precisely separate and quantify the contributions of the individual modalities and their interactions without approximations. By integrating an open-source implementation with the SHAP package, we enhance reproducibility and ease of use. We show that InterSHAP accurately measures the presence of cross-modal interactions, can handle multiple modalities, and provides detailed explanations at a local level for individual samples. Furthermore, we apply InterSHAP to multimodal medical datasets and demonstrate its applicability for individualised explanations.

Problem

Research questions and friction points this paper is trying to address.

Artificial Intelligence

Medical Applications

Interpretable Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

InterSHAP

multi-modal explainability

AI healthcare integration

🔎 Similar Papers

What to align in multimodal contrastive learning?