GCE-MIL: Faithful and Recoverable Evidence for Multiple Instance Learning in Whole-Slide Imaging

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing multiple instance learning (MIL) approaches erroneously treat attention weights as diagnostic evidence, resulting in selected image patches that lack sufficiency, necessity, and recoverability (S/N/R)—key properties required for faithfully representing discriminative regions. To address this, this work proposes GCE-MIL, a novel framework that explicitly optimizes evidence quality as an independent objective. By integrating a concept alignment mechanism, a differentiable noisy-OR coverage model, and a threshold-based repair strategy, GCE-MIL enhances the S/N/R characteristics of evidence and enables a reliable transition from continuous selection to discrete subsets. The method is backbone-agnostic and demonstrates consistent improvements across nine backbones and nine datasets, achieving average gains of 0.024 in Macro-F1 and 0.014 in C-index, reducing the continuous-to-discrete performance gap by 4–7×, and accelerating inference by up to 5× while retaining 98.9% of original performance.

📝 Abstract

Multiple instance learning (MIL) is the standard approach for whole-slide image (WSI) classification and survival prediction, where attention-based models ag gregate patch features into slide-level predictions. These models treat attention weights as evidence for their predictions, but attention is optimized for classi fication, not for identifying which patches actually support the diagnosis. This conflation leads to three failures: selected patches are insufficient (keeping them alone drops Macro-F1 by 0.078), unnecessary (removing them barely changes the prediction), and unrecoverable (continuous attention scores disagree with discrete patch subsets used at inference). The central premise is that evidence quality should be optimized directly through explicit criteria- Sufficiency, Necessity, and Recov erability (S/N/R)- rather than inherited as a byproduct of classification. GCE-MIL is a backbone-agnostic wrapper implemented through three injection modes and three evidence components: a grounding mechanism that aligns selection with domain-specific concepts, noisy-OR coverage that acts as a differentiable proxy for interventional evidence search, and threshold-plus-repair recovery that converts continuous selectors into discrete subsets through marginal-guided repair. Across 9 backbones and 9 datasets (81 configurations), GCE-MIL improves average Macro-F1 by 0.024 and C-index by 0.014, reduces the continuous-discrete gap by 4-7, and increases complement degradation by 2-4. With optional tile prefiltering after discrete recovery, inference runs up to 5 faster while retaining 0.989 full-bag utility.

Problem

Research questions and friction points this paper is trying to address.

Multiple Instance Learning

Whole-Slide Imaging

Attention Mechanism

Evidence Faithfulness

Interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiple Instance Learning

Evidence Quality

Sufficiency-Necessity-Recoverability