FASL-Seg: Anatomy and Tool Segmentation of Surgical Scenes

πŸ“… 2025-09-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In surgical scene semantic segmentation, existing methods often neglect anatomical structures and struggle to jointly model low- and high-level features. To address these challenges, we propose the Dual-Stream Feature Adaptive Spatial Localization Network (DF-ASLNet), which employs parallel low-level edge and high-level contextual branches. By integrating multi-scale feature fusion with spatial attention mechanisms, DF-ASLNet achieves hierarchical feature alignment and precise spatial localization, enabling simultaneous pixel-wise segmentation of both anatomical structures and surgical instruments within a unified framework. Evaluated on the EndoVis17 and EndoVis18 datasets, our method achieves overall mIoU scores of 72.78% and 72.71%, respectively, and an instrument-specific mIoU of 85.61%, substantially outperforming current state-of-the-art approaches. These results demonstrate DF-ASLNet’s superior robustness and fine-grained semantic parsing capability, providing a stronger foundation for vision-based understanding and intelligent assistance in minimally invasive surgery.

Technology Category

Application Category

πŸ“ Abstract
The growing popularity of robotic minimally invasive surgeries has made deep learning-based surgical training a key area of research. A thorough understanding of the surgical scene components is crucial, which semantic segmentation models can help achieve. However, most existing work focuses on surgical tools and overlooks anatomical objects. Additionally, current state-of-the-art (SOTA) models struggle to balance capturing high-level contextual features and low-level edge features. We propose a Feature-Adaptive Spatial Localization model (FASL-Seg), designed to capture features at multiple levels of detail through two distinct processing streams, namely a Low-Level Feature Projection (LLFP) and a High-Level Feature Projection (HLFP) stream, for varying feature resolutions - enabling precise segmentation of anatomy and surgical instruments. We evaluated FASL-Seg on surgical segmentation benchmark datasets EndoVis18 and EndoVis17 on three use cases. The FASL-Seg model achieves a mean Intersection over Union (mIoU) of 72.71% on parts and anatomy segmentation in EndoVis18, improving on SOTA by 5%. It further achieves a mIoU of 85.61% and 72.78% in EndoVis18 and EndoVis17 tool type segmentation, respectively, outperforming SOTA overall performance, with comparable per-class SOTA results in both datasets and consistent performance in various classes for anatomy and instruments, demonstrating the effectiveness of distinct processing streams for varying feature resolutions.
Problem

Research questions and friction points this paper is trying to address.

Segments anatomy and surgical tools in robotic surgeries
Balances high-level context and low-level edge features
Improves semantic segmentation precision in surgical scenes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-stream architecture for multi-level feature capture
Low-Level Feature Projection for precise edge details
High-Level Feature Projection for contextual understanding
πŸ”Ž Similar Papers
No similar papers found.
M
Muraam Abdel-Ghani
Department of Surgery, Hamad Medical Cooperation
Mahmoud Ali
Mahmoud Ali
Indiana University
RoboticsAutonomous Navigation
Mohamed Ali
Mohamed Ali
University of Washington Tacoma
Database systemsData Stream SystemsGeographic Information SystemsSpatiotemporal databases
F
Fatmaelzahraa Ahmed
Department of Surgery, Hamad Medical Cooperation
M
Mohamed Arsalan
College of Engineering, Qatar University
Abdulaziz Al-Ali
Abdulaziz Al-Ali
Qatar University
Machine LearningArtificial Neural NetworksApplied Artificial Intelligence
S
Shidin Balakrishnan
Department of Surgery, Hamad Medical Cooperation