Reinforcement Learning for Follow-the-Leader Robotic Endoscopic Navigation via Synthetic Data

πŸ“… 2026-01-06
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes a follower-type flexible continuum endoscopic robot to address the challenge of frequent collisions with lumen walls during autonomous navigation in confined tubular environments. By integrating monocular depth estimation with deep reinforcement learning, the system leverages synthetic data generated via NVIDIA Replicator in a high-fidelity intestinal simulation environment built on NVIDIA Omniverse. The Depth Anything model is fine-tuned using this synthetic data to enhance 3D perception accuracy, while a geometry-aware reward mechanism is designed to enable precise lumen tracking. To the best of our knowledge, this study presents the first integration of synthetic data–driven depth estimation with reinforcement learning for endoscopic navigation, achieving a 39.2% improvement in δ₁ depth accuracy over the original model and reducing the navigation J-index by 0.67 compared to the next-best method, thereby significantly enhancing obstacle avoidance capability and system robustness.

Technology Category

Application Category

πŸ“ Abstract
Autonomous navigation is crucial for both medical and industrial endoscopic robots, enabling safe and efficient exploration of narrow tubular environments without continuous human intervention, where avoiding contact with the inner walls has been a longstanding challenge for prior approaches. We present a follow-the-leader endoscopic robot based on a flexible continuum structure designed to minimize contact between the endoscope body and intestinal walls, thereby reducing patient discomfort. To achieve this objective, we propose a vision-based deep reinforcement learning framework guided by monocular depth estimation. A realistic intestinal simulation environment was constructed in \textit{NVIDIA Omniverse} to train and evaluate autonomous navigation strategies. Furthermore, thousands of synthetic intraluminal images were generated using NVIDIA Replicator to fine-tune the Depth Anything model, enabling dense three-dimensional perception of the intestinal environment with a single monocular camera. Subsequently, we introduce a geometry-aware reward and penalty mechanism to enable accurate lumen tracking. Compared with the original Depth Anything model, our method improves $\delta_{1}$ depth accuracy by 39.2% and reduces the navigation J-index by 0.67 relative to the second-best method, demonstrating the robustness and effectiveness of the proposed approach.
Problem

Research questions and friction points this paper is trying to address.

autonomous navigation
endoscopic robot
wall contact avoidance
narrow tubular environments
follow-the-leader
Innovation

Methods, ideas, or system contributions that make the work stand out.

deep reinforcement learning
synthetic data
monocular depth estimation
continuum robot
endoscopic navigation
S
Sicong Gao
School of Computer Science and Engineering, The University of New South Wales, Sydney 2052, Australia
C
Chen Qian
School of Mechanical and Manufacturing Engineering, The University of New South Wales, Sydney, NSW 2052, Australia
L
Laurence Xian
School of Mechanical and Manufacturing Engineering, The University of New South Wales, Sydney, NSW 2052, Australia
L
Liao Wu
School of Mechanical and Manufacturing Engineering, The University of New South Wales, Sydney, NSW 2052, Australia
M
M. Pagnucco
School of Computer Science and Engineering, The University of New South Wales, Sydney 2052, Australia
Yang Song
Yang Song
Associate Professor, University of New South Wales
Biomedical Image AnalysisComputer VisionMachine LearningArtificial Intelligence