🤖 AI Summary
This study addresses the challenge of semantic segmentation in histopathology images under the absence of pixel-level annotations by proposing VSLP, a two-stage framework. First, a pretrained Vision Transformer generates pixel-wise confidence maps; subsequently, dense predictions are refined through a variational optimization scheme that integrates a Wasserstein data fidelity term with a learned regularizer. Notably, this work is the first to incorporate learned regularization into a variational segmentation framework that relies solely on global tissue proportions, while also enabling energy visualization to enhance model interpretability. Evaluated on two public datasets, VSLP outperforms existing weakly supervised and unsupervised methods, and it significantly surpasses current state-of-the-art approaches on an internal dataset with noisy labels.
📝 Abstract
In pathology, the spatial distribution and proportions of tissue types are key indicators of disease progression, and are more readily available than fine-grained annotations. However, these assessments are rarely mapped to pixel-wise segmentation. The task is fundamentally underdetermined, as many spatially distinct segmentations can satisfy the same global proportions in the absence of pixel-wise constraints. To address this, we introduce Variational Segmentation from Label Proportions (VSLP), a two-stage framework that infers dense segmentations from global label proportions, without any pixel-level annotations. This framework first leverages a pre-trained transformer model with test-time augmentation to produce a pixel-wise confidence estimate. In the second stage, these estimates are fused by solving a variational optimization problem that incorporates a Wasserstein data fidelity term alongside a learned regularizer. Unlike end-to-end networks, our variational method can visualize the fidelity-regularization energy, resulting in more interpretable segmentation. We validate our approach on two public datasets, achieving superior performance over existing weakly supervised and unsupervised methods. For one of these datasets, proportions have been estimated by an experienced pathologist to provide a realistic benchmark to the community. Furthermore, the method scales to an in-house dataset with noisy pathologist labels, severely outperforming state-of-the-art methods, thereby demonstrating practical applicability. The code and data will be made publicly available upon acceptance at https://github.com/xiaoliangpi/VSLP.