🤖 AI Summary
Attention-based multiple instance learning (MIL) for whole-slide image (WSI) classification is vulnerable to staining artifacts and non-causal tissue morphologies, leading to unreliable patch-level predictions and poor interpretability. Method: We propose FocusMIL, a causally grounded max-pooling MIL framework that systematically demonstrates— for the first time—the superior robustness of max-pooling over attention mechanisms in isolating causal histopathological features while avoiding spurious correlations. Its lightweight, end-to-end trainable architecture requires no auxiliary supervision or complex regularization. Results: Evaluated on two public WSI benchmarks, FocusMIL significantly outperforms state-of-the-art attention-based MIL methods in classification accuracy, patch-level prediction consistency, and heatmap interpretability. It establishes a new paradigm for computational pathology that jointly ensures generalizability and causal validity.
📝 Abstract
Although attention-based multi-instance learning (MIL) algorithms have achieved impressive performance on slide-level whole slide image (WSI) classification tasks, they are prone to mistakenly focusing on irrelevant patterns such as staining conditions and tissue morphology, leading to incorrect patch-level predictions and unreliable interpretability. In this paper, we analyze why attention-based methods tend to rely on spurious correlations in their predictions. Furthermore, we revisit max-pooling-based approaches and examine the reasons behind the underperformance of existing methods. We argue that well-trained max-pooling-based MIL models can make predictions based on causal factors and avoid relying on spurious correlations. Building on these insights, we propose a simple yet effective max-pooling-based MIL method (FocusMIL) that outperforms existing mainstream attention-based methods on two datasets. In this position paper, we advocate renewed attention to max-pooling-based methods to achieve more robust and interpretable predictions.