🤖 AI Summary
Surgical segmentation faces dual challenges of scarce annotations and cross-procedure semantic inconsistency, limiting the generalizability of existing fine-tuning approaches based on natural foundation models (e.g., SAM). To address this, we propose a hierarchical concept evolution pretraining paradigm: (1) constructing a laparoscopic concept hierarchy (LCH) to unify multi-granularity semantics across anatomical structures, tissues, and instruments; (2) designing a confidence-driven iterative pseudo-labeling mechanism; and (3) introducing a hierarchical mask decoder with parent-child query embeddings, coupled with self-supervised pseudo-label generation and filtering, for pretraining on large-scale unlabeled laparoscopic images. We further establish LapBench-114K, a new benchmark dataset. Experiments demonstrate significant improvements over state-of-the-art methods across multiple surgical segmentation tasks, achieving— for the first time—granularity-adaptive universal surgical segmentation with strong generalizability and clinical deployment potential.
📝 Abstract
Surgical segmentation is pivotal for scene understanding yet remains hindered by annotation scarcity and semantic inconsistency across diverse procedures. Existing approaches typically fine-tune natural foundation models (e.g., SAM) with limited supervision, functioning merely as domain adapters rather than surgical foundation models. Consequently, they struggle to generalize across the vast variability of surgical targets. To bridge this gap, we present LapFM, a foundation model designed to evolve robust segmentation capabilities from massive unlabeled surgical images. Distinct from medical foundation models relying on inefficient self-supervised proxy tasks, LapFM leverages a Hierarchical Concept Evolving Pre-training paradigm. First, we establish a Laparoscopic Concept Hierarchy (LCH) via a hierarchical mask decoder with parent-child query embeddings, unifying diverse entities (i.e., Anatomy, Tissue, and Instrument) into a scalable knowledge structure with cross-granularity semantic consistency. Second, we propose a Confidence-driven Evolving Labeling that iteratively generates and filters pseudo-labels based on hierarchical consistency, progressively incorporating reliable samples from unlabeled images into training. This process yields LapBench-114K, a large-scale benchmark comprising 114K image-mask pairs. Extensive experiments demonstrate that LapFM significantly outperforms state-of-the-art methods, establishing new standards for granularity-adaptive generalization in universal laparoscopic segmentation. The source code is available at https://github.com/xq141839/LapFM.