🤖 AI Summary
This study addresses the challenge of partial occlusion in clinical endoscopic images caused by overlapping instruments or tissues. To systematically evaluate the robustness of SAM-family models under such conditions, the authors introduce OccSAM-Bench, a benchmark synthesizing two occlusion types across three severity levels on three polyp datasets. They propose a novel three-region evaluation protocol—encompassing the full target, visible region, and occluded region—which reveals, for the first time, distinct behavioral differences between Occluder-Aware and Occluder-Agnostic models: the former better preserves visible boundaries, while the latter tends to hallucinate occluded areas. The analysis also indicates that SAM-Med2D exhibits suboptimal performance overall. This work provides critical insights for model selection in real-world clinical applications involving occlusions.
📝 Abstract
Occlusion, where target structures are partially hidden by surgical instruments or overlapping tissues, remains a critical yet underexplored challenge for foundation segmentation models in clinical endoscopy. We introduce OccSAM-Bench, a benchmark designed to systematically evaluate SAM-family models under controlled, synthesized surgical occlusion. Our framework simulates two occlusion types (i.e., surgical tool overlay and cutout) across three calibrated severity levels on three public polyp datasets. We propose a novel three-region evaluation protocol that decomposes segmentation performance into full, visible-only, and invisible targets. This metric exposes behaviors that standard amodal evaluation obscures, revealing two distinct model archetypes: Occluder-Aware models (SAM, SAM 2, SAM 3, MedSAM3), which prioritize visible tissue delineation and reject instruments, and Occluder-Agnostic models (MedSAM, MedSAM2), which confidently predict into occluded regions. SAM-Med2D aligns with neither and underperforms across all conditions. Ultimately, our results demonstrate that occlusion robustness is not uniform across architectures, and model selection must be driven by specific clinical intent-whether prioritizing conservative visible-tissue segmentation or the amodal inference of hidden anatomy.