Beyond Attention Heatmaps: How to Get Better Explanations for Multiple Instance Learning Models in Histopathology

📅 2026-03-09
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of systematic evaluation of explanation methods in multiple instance learning (MIL) models widely used in computational pathology, which commonly rely on attention heatmaps for interpretability. The authors propose a general, annotation-free framework to comprehensively benchmark various explanation techniques—including Layer-wise Relevance Propagation (LRP), Integrated Gradients, and Single Perturbation—across classification, regression, and survival analysis tasks under diverse architectures such as Attention, Transformer, and Mamba. Their large-scale evaluation reveals that both model architecture and task type significantly influence explanation quality, with LRP, Integrated Gradients, and Single Perturbation consistently outperforming conventional attention heatmaps. Furthermore, the high-performing heatmaps are correlated with spatial transcriptomics data to validate their biological relevance and uncover divergent decision strategies among models in predicting HPV infection status.

Technology Category

Application Category

📝 Abstract
Multiple instance learning (MIL) has enabled substantial progress in computational histopathology, where a large amount of patches from gigapixel whole slide images are aggregated into slide-level predictions. Heatmaps are widely used to validate MIL models and to discover tissue biomarkers. Yet, the validity of these heatmaps has barely been investigated. In this work, we introduce a general framework for evaluating the quality of MIL heatmaps without requiring additional labels. We conduct a large-scale benchmark experiment to assess six explanation methods across histopathology task types (classification, regression, survival), MIL model architectures (Attention-, Transformer-, Mamba-based), and patch encoder backbones (UNI2, Virchow2). Our results show that explanation quality mostly depends on MIL model architecture and task type, with perturbation ("Single"), layer-wise relevance propagation (LRP), and integrated gradients (IG) consistently outperforming attention-based and gradient-based saliency heatmaps, which often fail to reflect model decision mechanisms. We further demonstrate the advanced capabilities of the best-performing explanation methods: (i) We provide a proof-of-concept that MIL heatmaps of a bulk gene expression prediction model can be correlated with spatial transcriptomics for biological validation, and (ii) showcase the discovery of distinct model strategies for predicting human papillomavirus (HPV) infection from head and neck cancer slides. Our work highlights the importance of validating MIL heatmaps and establishes that improved explainability can enable more reliable model validation and yield biological insights, making a case for a broader adoption of explainable AI in digital pathology. Our code is provided in a public GitHub repository: https://github.com/bifold-pathomics/xMIL/tree/xmil-journal
Problem

Research questions and friction points this paper is trying to address.

Multiple Instance Learning
Explainability
Heatmap Validation
Histopathology
Model Interpretation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiple Instance Learning
Explainable AI
Heatmap Evaluation
Computational Histopathology
Spatial Transcriptomics
🔎 Similar Papers
No similar papers found.
Mina Jamshidi Idaji
Mina Jamshidi Idaji
Machine learning researcher at BIFOLD, TU Berlin
Machine LearningDeep LearningSignal processingComputational pathologyNeural data analysis
Julius Hense
Julius Hense
PhD Student at BIFOLD, TU Berlin
Computational PathologyExplainable AIMultimodal LearningRepresentation Learning
T
Tom Neuhäuser
Berlin Institute for the Foundations of Learning and Data, Berlin, Germany; Machine Learning Group, Technische Universität Berlin, Berlin, Germany
A
Augustin Krause
Machine Learning Group, Technische Universität Berlin, Berlin, Germany
Y
Yanqing Luo
Machine Learning Group, Technische Universität Berlin, Berlin, Germany
Oliver Eberle
Oliver Eberle
TU Berlin
Explainable AIInterpretabilityDeep LearningMachine LearningNLP
Thomas Schnake
Thomas Schnake
Technical University of Berlin
Machine Learning
Laure Ciernik
Laure Ciernik
PhD, TU Berlin
F
Farnoush Rezaei Jafari
Berlin Institute for the Foundations of Learning and Data, Berlin, Germany; Machine Learning Group, Technische Universität Berlin, Berlin, Germany
R
Reza Vahidimajd
Department of Computer Science and Engineering, The Chinese University of Hong Kong
Jonas Dippel
Jonas Dippel
TU Berlin
C
Christoph Walz
Institute of Pathology, Ludwig Maximilian University, Munich, Germany
Frederick Klauschen
Frederick Klauschen
Institute of Pathology, University of Munich (LMU)
PathologyDigital Pathology/AIPrecision MedicineMolecular DiagnosticsBioinformatics
A
Andreas Mock
Institute of Pathology, Ludwig Maximilian University, Munich, Germany; German Cancer Research Center, Heidelberg, and German Cancer Consortium, Munich, Germany
Klaus-Robert MĂźller
Klaus-Robert MĂźller
TU Berlin & Korea University & Google DeepMind & Max Planck Institute for Informatics, Germany
Machine learningartificial intelligencebig datacomputational neuroscience