Surg-SegFormer: A Dual Transformer-Based Model for Holistic Surgical Scene Segmentation

📅 2025-07-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of real-time, full-scene semantic segmentation in robot-assisted surgery (RAS), insufficient expert supervision, and the reliance of existing methods on manual prompts—particularly for long-video processing—this paper proposes Surg-SegFormer, a prompt-free segmentation framework. It employs a dual-Transformer encoder-decoder architecture that leverages self-attention to model global contextual dependencies, enabling end-to-end, fully automatic segmentation of anatomical tissues, surgical instruments, and critical vascular structures. Evaluated on EndoVis2018 and EndoVis2017, Surg-SegFormer achieves mean Intersection-over-Union (mIoU) scores of 0.80 and 0.54, respectively—substantially outperforming state-of-the-art approaches. The model demonstrates strong robustness, high automation, and clinical applicability, effectively alleviating the scarcity of expert teaching resources and enhancing intraoperative scene understanding efficiency.

Technology Category

Application Category

📝 Abstract
Holistic surgical scene segmentation in robot-assisted surgery (RAS) enables surgical residents to identify various anatomical tissues, articulated tools, and critical structures, such as veins and vessels. Given the firm intraoperative time constraints, it is challenging for surgeons to provide detailed real-time explanations of the operative field for trainees. This challenge is compounded by the scarcity of expert surgeons relative to trainees, making the unambiguous delineation of go- and no-go zones inconvenient. Therefore, high-performance semantic segmentation models offer a solution by providing clear postoperative analyses of surgical procedures. However, recent advanced segmentation models rely on user-generated prompts, rendering them impractical for lengthy surgical videos that commonly exceed an hour. To address this challenge, we introduce Surg-SegFormer, a novel prompt-free model that outperforms current state-of-the-art techniques. Surg-SegFormer attained a mean Intersection over Union (mIoU) of 0.80 on the EndoVis2018 dataset and 0.54 on the EndoVis2017 dataset. By providing robust and automated surgical scene comprehension, this model significantly reduces the tutoring burden on expert surgeons, empowering residents to independently and effectively understand complex surgical environments.
Problem

Research questions and friction points this paper is trying to address.

Automates surgical scene segmentation in robot-assisted surgery
Reduces reliance on expert surgeons for real-time explanations
Eliminates need for user prompts in lengthy surgical videos
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual Transformer-Based Model for segmentation
Prompt-free surgical scene segmentation
Automated comprehension of surgical environments
🔎 Similar Papers
No similar papers found.
F
Fatimaelzahraa Ahmed
Department of Surgery, Hamad Medical Corporation, Doha, P.O Box 3050, Qatar
M
Muraam Abdel-Ghani
Department of Surgery, Hamad Medical Corporation, Doha, P.O Box 3050, Qatar
M
Muhammad Arsalan
College of Engineering, Qatar University, Doha, P.O Box 2713, Qatar
Mahmoud Ali
Mahmoud Ali
Indiana University
RoboticsAutonomous Navigation
Abdulaziz Al-Ali
Abdulaziz Al-Ali
Qatar University
Machine LearningArtificial Neural NetworksApplied Artificial Intelligence
S
Shidin Balakrishnan
Department of Surgery, Hamad Medical Corporation, Doha, P.O Box 3050, Qatar