🤖 AI Summary
This work addresses the challenge of balancing interpretability and performance in whole-slide histopathology image analysis by proposing a lightweight selection head integrated into a powerful multiple instance learning (MIL) backbone. The method introduces a novel budget-sufficiency objective that automatically identifies a small set of discriminative tiles as diagnostic evidence without requiring additional annotations. By combining a soft tile gating mechanism with a hinge loss under explicit budget constraints, the model generates spatially compact and sparsely populated evidence sets sufficient for high-confidence predictions. Evaluated on TCGA-NSCLC, TCGA-BRCA, and PANDA datasets, the approach achieves an AUC of 0.983 on NSCLC using only 8.2 tiles on average, with an AUKC of 0.864, demonstrating interpretability substantially improved over baseline models while maintaining or slightly exceeding their predictive performance.
📝 Abstract
We introduce ReaMIL (Reasoning- and Evidence-Aware MIL), a multiple instance learning approach for whole-slide histopathology that adds a light selection head to a strong MIL backbone. The head produces soft per-tile gates and is trained with a budgeted-sufficiency objective: a hinge loss that enforces the true-class probability to be $\geq \tau$ using only the kept evidence, under a sparsity budget on the number of selected tiles. The budgeted-sufficiency objective yields small, spatially compact evidence sets without sacrificing baseline performance. Across TCGA-NSCLC (LUAD vs. LUSC), TCGA-BRCA (IDC vs. Others), and PANDA, ReaMIL matches or slightly improves baseline AUC and provides quantitative evidence-efficiency diagnostics. On NSCLC, it attains AUC 0.983 with a mean minimal sufficient K (MSK) $\approx 8.2$ tiles at $\tau = 0.90$ and AUKC $\approx 0.864$, showing that class confidence rises sharply and stabilizes once a small set of tiles is kept. The method requires no extra supervision, integrates seamlessly with standard MIL training, and naturally yields slide-level overlays. We report accuracy alongside MSK, AUKC, and contiguity for rigorous evaluation of model behavior on WSIs.