EndoSERV: A Vision-based Endoluminal Robot Navigation System

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of navigating endoluminal robots through complex, narrow, and tortuous anatomical pathways, where existing visual localization methods suffer from limited accuracy due to tissue deformation, in vivo artifacts, and a lack of distinctive visual features. To overcome these limitations, the authors propose EndoSERV, a novel approach that integrates segmented structured modeling with real-to-synthetic domain transfer learning without requiring ground-truth pose labels in real data. The method partitions long endoluminal trajectories into shorter segments for independent visual odometry estimation and leverages offline pretraining to extract texture-invariant features. During inference, it adaptively maps real-image features into a synthetic domain where pose supervision is available, enabling optimization using synthetic ground-truth poses. Experiments demonstrate that EndoSERV achieves high-precision and robust navigation on both public and clinical datasets, even in the absence of real-world pose annotations.

Technology Category

Application Category

📝 Abstract
Robot-assisted endoluminal procedures are increasingly used for early cancer intervention. However, the intricate, narrow and tortuous pathways within the luminal anatomy pose substantial difficulties for robot navigation. Vision-based navigation offers a promising solution, but existing localization approaches are error-prone due to tissue deformation, in vivo artifacts and a lack of distinctive landmarks for consistent localization. This paper presents a novel EndoSERV localization method to address these challenges. It includes two main parts, \textit{i.e.}, \textbf{SE}gment-to-structure and \textbf{R}eal-to-\textbf{V}irtual mapping, and hence the name. For long-range and complex luminal structures, we divide them into smaller sub-segments and estimate the odometry independently. To cater for label insufficiency, an efficient transfer technique maps real image features to the virtual domain to use virtual pose ground truth. The training phases of EndoSERV include an offline pretraining to extract texture-agnostic features, and an online phase that adapts to real-world conditions. Extensive experiments based on both public and clinical datasets have been performed to demonstrate the effectiveness of the method even without any real pose labels.
Problem

Research questions and friction points this paper is trying to address.

endoluminal navigation
vision-based localization
tissue deformation
lack of landmarks
robot-assisted intervention
Innovation

Methods, ideas, or system contributions that make the work stand out.

EndoSERV
vision-based navigation
real-to-virtual mapping
segment-to-structure
label-efficient localization
🔎 Similar Papers
No similar papers found.
J
Junyang Wu
Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China, 200240
F
Fangfang Xie
Shanghai Chest Hospital, Shanghai, China
M
Minghui Zhang
Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China, 200240
Hanxiao Zhang
Hanxiao Zhang
Nanjing University
J
Jiayuan Sun
Shanghai Chest Hospital, Shanghai, China
Yun Gu
Yun Gu
Shanghai Jiao Tong University
Medical Image AnalysisComputer-Assisted Intervention
G
Guang-Zhong Yang
Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China, 200240