🤖 AI Summary
Existing surgical instrument segmentation methods treat instance-level instrument segmentation (IIS) and part-level semantic segmentation (PSS) as disjoint tasks, neglecting their structural interdependencies and thus failing to achieve unified part-aware instance segmentation (PIS). To address this, we propose SurgPIS—the first PIS model specifically designed for surgical instruments. SurgPIS introduces part-specific queries and explicitly models the hierarchical relationship between instrument instances and their constituent parts. Built upon a Transformer-based mask classification architecture, it employs a weakly supervised learning strategy that jointly leverages separate instance-level and part-level annotations. Optimization is achieved via a composite objective comprising prediction-aggregation loss and student–teacher consistency constraints. Extensive experiments on multiple benchmarks demonstrate that SurgPIS consistently outperforms prior approaches across all four related tasks: PIS, IIS, PSS, and instrument semantic segmentation—achieving state-of-the-art performance in each.
📝 Abstract
Consistent surgical instrument segmentation is critical for automation in robot-assisted surgery. Yet, existing methods only treat instrument-level instance segmentation (IIS) or part-level semantic segmentation (PSS) separately, without interaction between these tasks. In this work, we formulate a surgical tool segmentation as a unified part-aware instance segmentation (PIS) problem and introduce SurgPIS, the first PIS model for surgical instruments. Our method adopts a transformer-based mask classification approach and introduces part-specific queries derived from instrument-level object queries, explicitly linking parts to their parent instrument instances. In order to address the lack of large-scale datasets with both instance- and part-level labels, we propose a weakly-supervised learning strategy for SurgPIS to learn from disjoint datasets labelled for either IIS or PSS purposes. During training, we aggregate our PIS predictions into IIS or PSS masks, thereby allowing us to compute a loss against partially labelled datasets. A student-teacher approach is developed to maintain prediction consistency for missing PIS information in the partially labelled data, e.g., parts of the IIS labelled data. Extensive experiments across multiple datasets validate the effectiveness of SurgPIS, achieving state-of-the-art performance in PIS as well as IIS, PSS, and instrument-level semantic segmentation.