🤖 AI Summary
To address overfitting in whole-slide image (WSI) classification caused by attention over-concentration in multi-instance learning (MIL) models, this paper proposes Attention Entropy Maximization (AEM), a novel regularization method. AEM augments standard MIL frameworks with a negative-entropy attention regularizer that explicitly maximizes the entropy of instance-level attention distributions, thereby mitigating attention bias and improving generalization. The method is lightweight: it requires no auxiliary modules, multi-stage training, or intricate hyperparameter tuning—only a single tunable hyperparameter—and integrates seamlessly into existing pipelines. Extensive experiments demonstrate that AEM consistently improves classification accuracy across three established WSI benchmarks. It is broadly compatible with four feature extractors (e.g., ResNet, ViTs), two MIL architectures (ABMIL, DSMIL), three attention mechanisms, and subsampling-based data augmentation. The implementation is publicly available.
📝 Abstract
Multiple Instance Learning (MIL) has demonstrated effectiveness in analyzing whole slide images (WSIs), yet it often encounters overfitting challenges in real-world applications, particularly in the form of attention over-concentration. While existing methods to alleviate this issue introduce complex modules or processing steps, such as multiple-stage training and teacher-student distillation, this paper proposes a simple yet effective regularization: Attention Entropy Maximization (AEM). Motivated by our investigation revealing a positive correlation between attention entropy and model performance, AEM incorporates a negative entropy loss for attention values into the standard MIL framework, penalizing overly concentrated attention and encouraging the model to consider a broader range of informative regions in WSIs, potentially improving its generalization capabilities. Compared to existing overfitting mitigation methods, our AEM approach offers advantages of simplicity, efficiency, and versatility. It requires no additional modules or processing steps, involves only one hyperparameter, and demonstrates compatibility with MIL frameworks and techniques. These advantages make AEM particularly attractive for practical applications. We evaluate AEM on three benchmark datasets, demonstrating consistent performance improvements over existing methods. Furthermore, AEM shows high versatility, integrating effectively with four feature extractors, two advanced MIL frameworks, three attention mechanisms, and Subsampling augmentation technique. The source code is available at url{https://github.com/dazhangyu123/AEM}.