AEM: Attention Entropy Maximization for Multiple Instance Learning based Whole Slide Image Classification

📅 2024-06-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address overfitting in whole-slide image (WSI) classification caused by attention over-concentration in multi-instance learning (MIL) models, this paper proposes Attention Entropy Maximization (AEM), a novel regularization method. AEM augments standard MIL frameworks with a negative-entropy attention regularizer that explicitly maximizes the entropy of instance-level attention distributions, thereby mitigating attention bias and improving generalization. The method is lightweight: it requires no auxiliary modules, multi-stage training, or intricate hyperparameter tuning—only a single tunable hyperparameter—and integrates seamlessly into existing pipelines. Extensive experiments demonstrate that AEM consistently improves classification accuracy across three established WSI benchmarks. It is broadly compatible with four feature extractors (e.g., ResNet, ViTs), two MIL architectures (ABMIL, DSMIL), three attention mechanisms, and subsampling-based data augmentation. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Multiple Instance Learning (MIL) has demonstrated effectiveness in analyzing whole slide images (WSIs), yet it often encounters overfitting challenges in real-world applications, particularly in the form of attention over-concentration. While existing methods to alleviate this issue introduce complex modules or processing steps, such as multiple-stage training and teacher-student distillation, this paper proposes a simple yet effective regularization: Attention Entropy Maximization (AEM). Motivated by our investigation revealing a positive correlation between attention entropy and model performance, AEM incorporates a negative entropy loss for attention values into the standard MIL framework, penalizing overly concentrated attention and encouraging the model to consider a broader range of informative regions in WSIs, potentially improving its generalization capabilities. Compared to existing overfitting mitigation methods, our AEM approach offers advantages of simplicity, efficiency, and versatility. It requires no additional modules or processing steps, involves only one hyperparameter, and demonstrates compatibility with MIL frameworks and techniques. These advantages make AEM particularly attractive for practical applications. We evaluate AEM on three benchmark datasets, demonstrating consistent performance improvements over existing methods. Furthermore, AEM shows high versatility, integrating effectively with four feature extractors, two advanced MIL frameworks, three attention mechanisms, and Subsampling augmentation technique. The source code is available at url{https://github.com/dazhangyu123/AEM}.
Problem

Research questions and friction points this paper is trying to address.

Prevents overfitting in whole slide image classification
Enhances attention entropy to improve model performance
Reduces sensitivity to regularization weight parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention Entropy Maximization for regularization
Cosine Weight Annealing reduces parameter sensitivity
Integrates AEM into MIL to prevent overfitting
🔎 Similar Papers
No similar papers found.
Y
Yunlong Zhang
College of Computer Science and Technology, Zhejiang University, China; Research Center for Industries of the Future and School of Engineering, Westlake University, China
Zhongyi Shui
Zhongyi Shui
Ph.D. Candidate,Westlake University & Zhejiang University
Y
Yunxuan Sun
College of Computer Science and Technology, Zhejiang University, China; Research Center for Industries of the Future and School of Engineering, Westlake University, China
Honglin Li
Honglin Li
Westlake University
Computer VisionMultimodal LLMbiomedical image analysis
J
Jingxiong Li
College of Computer Science and Technology, Zhejiang University, China; Research Center for Industries of the Future and School of Engineering, Westlake University, China
C
Chenglu Zhu
Research Center for Industries of the Future and School of Engineering, Westlake University, China
L
Lin Yang
Research Center for Industries of the Future and School of Engineering, Westlake University, China