🤖 AI Summary
Pixel-level segmentation of hands and surgical instruments in first-person videos of open surgery remains an underexplored challenge. Method: This paper introduces the first hand–instrument interaction segmentation task tailored to this domain and presents EGO-Surg—the first fine-grained egocentric hand–instrument segmentation dataset—featuring pixel-level annotations for 14 instrument categories, both hands, and hand–instrument interaction regions. A novel three-level (instrument/hand/interaction) unified annotation framework is proposed, with annotations rigorously refined by experts through multiple validation rounds. Benchmark evaluations are conducted on mainstream instance segmentation models, including Mask R-CNN. Contribution/Results: Our approach establishes new state-of-the-art performance, improving mAP by 12.6%. The EGO-Surg dataset is publicly released, serving as a foundational resource for visual understanding in open surgery.
📝 Abstract
Egocentric open-surgery videos capture rich, fine-grained details essential for accurately modeling surgical procedures and human behavior in the operating room. A detailed, pixel-level understanding of hands and surgical tools is crucial for interpreting a surgeon's actions and intentions. We introduce EgoSurgery-HTS, a new dataset with pixel-wise annotations and a benchmark suite for segmenting surgical tools, hands, and interacting tools in egocentric open-surgery videos. Specifically, we provide a labeled dataset for (1) tool instance segmentation of 14 distinct surgical tools, (2) hand instance segmentation, and (3) hand-tool segmentation to label hands and the tools they manipulate. Using EgoSurgery-HTS, we conduct extensive evaluations of state-of-the-art segmentation methods and demonstrate significant improvements in the accuracy of hand and hand-tool segmentation in egocentric open-surgery videos compared to existing datasets. The dataset will be released at https://github.com/Fujiry0/EgoSurgery.