AEM: Attention Entropy Maximization for Multiple Instance Learning based Whole Slide Image Classification

📅 2024-06-18

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

To address overfitting in whole-slide image (WSI) classification caused by attention over-concentration in multi-instance learning (MIL) models, this paper proposes Attention Entropy Maximization (AEM), a novel regularization method. AEM augments standard MIL frameworks with a negative-entropy attention regularizer that explicitly maximizes the entropy of instance-level attention distributions, thereby mitigating attention bias and improving generalization. The method is lightweight: it requires no auxiliary modules, multi-stage training, or intricate hyperparameter tuning—only a single tunable hyperparameter—and integrates seamlessly into existing pipelines. Extensive experiments demonstrate that AEM consistently improves classification accuracy across three established WSI benchmarks. It is broadly compatible with four feature extractors (e.g., ResNet, ViTs), two MIL architectures (ABMIL, DSMIL), three attention mechanisms, and subsampling-based data augmentation. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Multiple Instance Learning (MIL) has demonstrated effectiveness in analyzing whole slide images (WSIs), yet it often encounters overfitting challenges in real-world applications, particularly in the form of attention over-concentration. While existing methods to alleviate this issue introduce complex modules or processing steps, such as multiple-stage training and teacher-student distillation, this paper proposes a simple yet effective regularization: Attention Entropy Maximization (AEM). Motivated by our investigation revealing a positive correlation between attention entropy and model performance, AEM incorporates a negative entropy loss for attention values into the standard MIL framework, penalizing overly concentrated attention and encouraging the model to consider a broader range of informative regions in WSIs, potentially improving its generalization capabilities. Compared to existing overfitting mitigation methods, our AEM approach offers advantages of simplicity, efficiency, and versatility. It requires no additional modules or processing steps, involves only one hyperparameter, and demonstrates compatibility with MIL frameworks and techniques. These advantages make AEM particularly attractive for practical applications. We evaluate AEM on three benchmark datasets, demonstrating consistent performance improvements over existing methods. Furthermore, AEM shows high versatility, integrating effectively with four feature extractors, two advanced MIL frameworks, three attention mechanisms, and Subsampling augmentation technique. The source code is available at url{https://github.com/dazhangyu123/AEM}.

Problem

Research questions and friction points this paper is trying to address.

Prevents overfitting in whole slide image classification

Enhances attention entropy to improve model performance

Reduces sensitivity to regularization weight parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention Entropy Maximization for regularization

Cosine Weight Annealing reduces parameter sensitivity

Integrates AEM into MIL to prevent overfitting

🔎 Similar Papers

No similar papers found.