🤖 AI Summary
Existing multiple instance learning (MIL) methods for whole-slide image (WSI) analysis over-rely on feature aggregation while neglecting instance-level representation learning; they also commonly assume that pretrained feature extractors can be directly transferred, leading to suboptimal performance. To address this, we propose the first weakly supervised pretraining framework specifically designed for MIL: it generates pseudo-instance labels via bag-level label propagation and jointly optimizes instance features using strong data augmentation, a nonlinear prediction head, and a robust loss function. Our approach requires no pixel-level annotations and supports pathology-specific fine-tuning as well as joint pretraining across multiple datasets. Evaluated on several large-scale WSI benchmarks, it significantly outperforms both ImageNet pretraining and state-of-the-art self-supervised methods, achieving new state-of-the-art results on downstream classification and survival prediction tasks.
📝 Abstract
Various multi-instance learning (MIL) based approaches have been developed and successfully applied to whole-slide pathological images (WSI). Existing MIL methods emphasize the importance of feature aggregators, but largely neglect the instance-level representation learning. They assume that the availability of a pre-trained feature extractor can be directly utilized or fine-tuned, which is not always the case. This paper proposes to pre-train feature extractor for MIL via a weakly-supervised scheme, i.e., propagating the weak bag-level labels to the corresponding instances for supervised learning. To learn effective features for MIL, we further delve into several key components, including strong data augmentation, a non-linear prediction head and the robust loss function. We conduct experiments on common large-scale WSI datasets and find it achieves better performance than other pre-training schemes (e.g., ImageNet pre-training and self-supervised learning) in different downstream tasks. We further show the compatibility and scalability of the proposed scheme by deploying it in fine-tuning the pathological-specific models and pre-training on merged multiple datasets. To our knowledge, this is the first work focusing on the representation learning for MIL.