🤖 AI Summary
In whole-slide image analysis, the scarcity of annotations and the use of bag-level labels lead to sparse patch-level supervision, making it challenging to identify discriminative regions and resulting in unstable training. To address this, this work proposes a spatially regularized multiple instance learning framework that, for the first time, leverages the inherent spatial dependencies among patches as a label-agnostic regularization signal. By jointly optimizing spatial feature reconstruction and classification objectives, the method enhances structural consistency and alignment between supervision and predictions. Extensive experiments on multiple public datasets demonstrate that the proposed approach significantly outperforms existing methods, effectively alleviating the sparse supervision problem while improving model stability and generalization capability.
📝 Abstract
Whole slide images, with their gigapixel-scale panoramas of tissue samples, are pivotal for precise disease diagnosis. However, their analysis is hindered by immense data size and scarce annotations. Existing MIL methods face challenges due to the fundamental imbalance where a single bag-level label must guide the learning of numerous patch-level features. This sparse supervision makes it difficult to reliably identify discriminative patches during training, leading to unstable optimization and suboptimal solutions. We propose a spatially regularized MIL framework that leverages inherent spatial relationships among patch features as label-independent regularization signals. Our approach learns a shared representation space by jointly optimizing feature-induced spatial reconstruction and label-guided classification objectives, enforcing consistency between intrinsic structural patterns and supervisory signals. Experimental results on multiple public datasets demonstrate significant improvements over state-of-the-art methods, offering a promising direction.