🤖 AI Summary
Existing multiple instance learning (MIL) methods for whole-slide image (WSI) analysis suffer from two critical limitations: causal non-interpretability and demographic bias—hindering fair clinical deployment. To address these, we propose the first causally-aware MIL framework: (1) a structured causal graph explicitly modeling confounders—including age, sex, and race; and (2) integration of do-calculus and counterfactual reasoning to disentangle true disease signals from spurious demographic associations, enabling debiased learning. Our method jointly leverages attention mechanisms and causal inference. Evaluated on CAMELYON16, TCGA-Lung, and TCGA-Multi, it achieves state-of-the-art performance, improves fairness metrics by over 65%, and increases survival prediction concordance index (C-index) by an average of 0.017. Ablation studies confirm the causal graph’s pivotal role in enhancing both interpretability and fairness.
📝 Abstract
Multiple instance learning (MIL) has emerged as the dominant paradigm for whole slide image (WSI) analysis in computational pathology, achieving strong diagnostic performance through patch-level feature aggregation. However, existing MIL methods face critical limitations: (1) they rely on attention mechanisms that lack causal interpretability, and (2) they fail to integrate patient demographics (age, gender, race), leading to fairness concerns across diverse populations. These shortcomings hinder clinical translation, where algorithmic bias can exacerbate health disparities. We introduce extbf{MeCaMIL}, a causality-aware MIL framework that explicitly models demographic confounders through structured causal graphs. Unlike prior approaches treating demographics as auxiliary features, MeCaMIL employs principled causal inference -- leveraging do-calculus and collider structures -- to disentangle disease-relevant signals from spurious demographic correlations. Extensive evaluation on three benchmarks demonstrates state-of-the-art performance across CAMELYON16 (ACC/AUC/F1: 0.939/0.983/0.946), TCGA-Lung (0.935/0.979/0.931), and TCGA-Multi (0.977/0.993/0.970, five cancer types). Critically, MeCaMIL achieves superior fairness -- demographic disparity variance drops by over 65% relative reduction on average across attributes, with notable improvements for underserved populations. The framework generalizes to survival prediction (mean C-index: 0.653, +0.017 over best baseline across five cancer types). Ablation studies confirm causal graph structure is essential -- alternative designs yield 0.048 lower accuracy and 4.2x times worse fairness. These results establish MeCaMIL as a principled framework for fair, interpretable, and clinically actionable AI in digital pathology. Code will be released upon acceptance.