🤖 AI Summary
To address the high annotation cost of fully supervised learning in histopathological cell segmentation, this paper proposes a bounding-box weakly supervised method that requires no fine-tuning of the Segment Anything Model (SAM). Our approach leverages SAM’s zero-shot box prompting capability to directly generate high-quality pseudo-masks for training a lightweight, standalone segmentation network. We further introduce a dual-path mask fusion mechanism based on integer programming, which jointly optimizes SAM’s detection-box-driven outputs and the segmentation network’s predictions under constraints of intensity consistency and spatial coherence. Notably, this is the first work to seamlessly integrate SAM’s box prompting throughout both training and inference—eliminating the need for model fine-tuning or architectural adaptation. Evaluated on CoNSep, MoNuSeg, and TNBC datasets, our method achieves Dice score improvements of 6–10 percentage points over state-of-the-art box-supervised approaches, significantly advancing the practicality of weakly supervised cell segmentation.
📝 Abstract
Cell segmentation in histopathological images is vital for diagnosis, and treatment of several diseases. Annotating data is tedious, and requires medical expertise, making it difficult to employ supervised learning. Instead, we study a weakly supervised setting, where only bounding box supervision is available, and present the use of Segment Anything (SAM) for this without any finetuning, i.e., directly utilizing the pre-trained model. We propose BoxCell, a cell segmentation framework that utilizes SAM's capability to interpret bounding boxes as prompts, emph{both} at train and test times. At train time, gold bounding boxes given to SAM produce (pseudo-)masks, which are used to train a standalone segmenter. At test time, BoxCell generates two segmentation masks: (1) generated by this standalone segmenter, and (2) a trained object detector outputs bounding boxes, which are given as prompts to SAM to produce another mask. Recognizing complementary strengths, we reconcile the two segmentation masks using a novel integer programming formulation with intensity and spatial constraints. We experiment on three publicly available cell segmentation datasets namely, CoNSep, MoNuSeg, and TNBC, and find that BoxCell significantly outperforms existing box supervised image segmentation models, obtaining 6-10 point Dice gains.