ASMIL: Attention-Stabilized Multiple Instance Learning for Whole Slide Imaging

📅 2026-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key limitations of attention mechanisms in multiple instance learning for whole-slide image diagnosis, where unstable attention distributions, excessive concentration, and overfitting often degrade performance. To jointly mitigate these issues, the authors propose ASMIL, a novel framework that systematically integrates three complementary components: an anchor-based model to stabilize attention distributions, a normalized Sigmoid activation replacing Softmax to alleviate over-concentration, and token-wise stochastic dropout to suppress overfitting. These modules are designed to be generic and can be seamlessly incorporated into existing MIL architectures. Experimental results on public datasets demonstrate that ASMIL alone improves F1 score by up to 6.49%, and when integrated into baseline models, yields further gains of up to 10.73% in F1 score.

Technology Category

Application Category

📝 Abstract
Attention-based multiple instance learning (MIL) has emerged as a powerful framework for whole slide image (WSI) diagnosis, leveraging attention to aggregate instance-level features into bag-level predictions. Despite this success, we find that such methods exhibit a new failure mode: unstable attention dynamics. Across four representative attention-based MIL methods and two public WSI datasets, we observe that attention distributions oscillate across epochs rather than converging to a consistent pattern, degrading performance. This instability adds to two previously reported challenges: overfitting and over-concentrated attention distribution. To simultaneously overcome these three limitations, we introduce attention-stabilized multiple instance learning (ASMIL), a novel unified framework. ASMIL uses an anchor model to stabilize attention, replaces softmax with a normalized sigmoid function in the anchor to prevent over-concentration, and applies token random dropping to mitigate overfitting. Extensive experiments demonstrate that ASMIL achieves up to a 6.49\% F1 score improvement over state-of-the-art methods. Moreover, integrating the anchor model and normalized sigmoid into existing attention-based MIL methods consistently boosts their performance, with F1 score gains up to 10.73\%. All code and data are publicly available at https://github.com/Linfeng-Ye/ASMIL.
Problem

Research questions and friction points this paper is trying to address.

attention instability
multiple instance learning
whole slide imaging
overfitting
attention concentration
Innovation

Methods, ideas, or system contributions that make the work stand out.

attention stabilization
multiple instance learning
whole slide imaging
normalized sigmoid
token dropping
🔎 Similar Papers
No similar papers found.