🤖 AI Summary
To address the scarcity of high-quality annotations and the trade-off between semantic richness and labeling efficiency in laparoscopic surgical instrument localization, this paper proposes a novel collaborative annotation paradigm integrating skeletal pose estimation and instance segmentation. Based on this paradigm, we introduce ROBUST-MIPS—the first large-scale, open-source dataset supporting joint modeling of both tasks—alongside a lightweight annotation tool and a unified benchmark model capable of end-to-end pose and segmentation prediction. Experiments demonstrate that pose guidance significantly improves localization accuracy and cross-domain generalization, achieving state-of-the-art performance across multiple metrics. All data, code, models, and evaluation frameworks are publicly released to enable reproducible and scalable research in surgical instrument perception.
📝 Abstract
Localisation of surgical tools constitutes a foundational building block for computer-assisted interventional technologies. Works in this field typically focus on training deep learning models to perform segmentation tasks. Performance of learning-based approaches is limited by the availability of diverse annotated data. We argue that skeletal pose annotations are a more efficient annotation approach for surgical tools, striking a balance between richness of semantic information and ease of annotation, thus allowing for accelerated growth of available annotated data. To encourage adoption of this annotation style, we present, ROBUST-MIPS, a combined tool pose and tool instance segmentation dataset derived from the existing ROBUST-MIS dataset. Our enriched dataset facilitates the joint study of these two annotation styles and allow head-to-head comparison on various downstream tasks. To demonstrate the adequacy of pose annotations for surgical tool localisation, we set up a simple benchmark using popular pose estimation methods and observe high-quality results. To ease adoption, together with the dataset, we release our benchmark models and custom tool pose annotation software.