🤖 AI Summary
Existing medical image segmentation models are constrained by single-segmentation protocols or reliance on manual prompts, limiting their ability to automatically support concurrent segmentation across diverse semantic protocols—such as tissue, anatomy, and pathology—in unseen domains.
Method: We propose a novel multi-protocol consistent segmentation paradigm, built upon a vision–semantics-aligned multi-head decoder architecture. It incorporates protocol-aware feature disentanglement and cross-image semantic consistency regularization, trained via a zero-shot domain generalization strategy to enable fully automatic, multi-label, semantically consistent segmentation without human intervention.
Contribution/Results: Evaluated on seven hold-out datasets, our method achieves state-of-the-art performance in multi-protocol segmentation plausibility, cross-image semantic consistency, and zero-shot generalization—significantly alleviating foundational model dependencies on single-protocol constraints or strong manual prompting.
📝 Abstract
A single biomedical image can be meaningfully segmented in multiple ways, depending on the desired application. For instance, a brain MRI can be segmented according to tissue types, vascular territories, broad anatomical regions, fine-grained anatomy, or pathology, etc. Existing automatic segmentation models typically either (1) support only a single protocol, the one they were trained on, or (2) require labor-intensive manual prompting to specify the desired segmentation. We introduce Pancakes, a framework that, given a new image from a previously unseen domain, automatically generates multi-label segmentation maps for multiple plausible protocols, while maintaining semantic consistency across related images. Pancakes introduces a new problem formulation that is not currently attainable by existing foundation models. In a series of experiments on seven held-out datasets, we demonstrate that our model can significantly outperform existing foundation models in producing several plausible whole-image segmentations, that are semantically coherent across images.