Geo-RepNet: Geometry-Aware Representation Learning for Surgical Phase Recognition in Endoscopic Submucosal Dissection

📅 2025-07-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Endoscopic submucosal dissection (ESD) poses challenges for surgical phase recognition due to high visual similarity among phases and insufficient structural cues in RGB images. Method: This work introduces depth maps as auxiliary modality for the first time, proposing a depth-guided geometric prior generation module and a geometry-enhanced multi-scale cross-attention mechanism to enable structure-aware representation learning. Built upon a reparameterizable RepVGG backbone, the method fuses RGB and depth modalities to explicitly encode anatomical geometric constraints. Results: Evaluated on a custom nine-phase ESD dataset, the approach achieves state-of-the-art performance, significantly improving robustness and generalization while maintaining low computational overhead—meeting clinical requirements for real-time assistance. Contribution: This is the first study to incorporate depth information into minimally invasive surgical phase recognition and to establish an end-to-end geometrically aware recognition framework.

Technology Category

Application Category

📝 Abstract
Surgical phase recognition plays a critical role in developing intelligent assistance systems for minimally invasive procedures such as Endoscopic Submucosal Dissection (ESD). However, the high visual similarity across different phases and the lack of structural cues in RGB images pose significant challenges. Depth information offers valuable geometric cues that can complement appearance features by providing insights into spatial relationships and anatomical structures. In this paper, we pioneer the use of depth information for surgical phase recognition and propose Geo-RepNet, a geometry-aware convolutional framework that integrates RGB image and depth information to enhance recognition performance in complex surgical scenes. Built upon a re-parameterizable RepVGG backbone, Geo-RepNet incorporates the Depth-Guided Geometric Prior Generation (DGPG) module that extracts geometry priors from raw depth maps, and the Geometry-Enhanced Multi-scale Attention (GEMA) to inject spatial guidance through geometry-aware cross-attention and efficient multi-scale aggregation. To evaluate the effectiveness of our approach, we construct a nine-phase ESD dataset with dense frame-level annotations from real-world ESD videos. Extensive experiments on the proposed dataset demonstrate that Geo-RepNet achieves state-of-the-art performance while maintaining robustness and high computational efficiency under complex and low-texture surgical environments.
Problem

Research questions and friction points this paper is trying to address.

Enhance surgical phase recognition using depth and RGB data
Address visual similarity challenges in endoscopic dissection phases
Improve accuracy in low-texture surgical environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates RGB and depth for surgical recognition
Uses Depth-Guided Geometric Prior Generation module
Employs Geometry-Enhanced Multi-scale Attention mechanism
🔎 Similar Papers
No similar papers found.
R
Rui Tang
Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
H
Haochen Yin
Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
Guankun Wang
Guankun Wang
The Chinese University of Hong Kong
Computer visionImage analysis
Long Bai
Long Bai
Research Assistant, Institute of Computing Technology, Chinese Academy of Sciences
Event-Centric AnalysisKnowledge GraphNatural Language Processing
A
An Wang
Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
Huxin Gao
Huxin Gao
CUHK | NUS | WHU
Surgical roboticsmachine/deep learning in robotics
J
Jiazheng Wang
Theory Lab, Central Research Institute, 2012 Labs, Huawei Technologies Co. Ltd., Hong Kong SAR, China
Hongliang Ren
Hongliang Ren
Chinese University of Hong Kong | National University of Singapore | JHU/Harvard(RF) | CUHK(PhD)
Biorobotics & intelligent systemsmedical mechatronicscontinuumsoft flexible robots/sensorsmultisensory perception