Safe Navigation for Robotic Digestive Endoscopy via Human Intervention-based Reinforcement Learning

πŸ“… 2024-09-24
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the challenge of safe navigation for robotic digestive endoscopes (RDEs) in narrow, unstructured gastrointestinal tracts, this paper proposes HI-PPOβ€”a human-intervention-enhanced Proximal Policy Optimization framework. HI-PPO innovatively integrates real-time clinical expert interventions into the reinforcement learning (RL) training loop, synergistically combining an Enhanced Exploration Mechanism (EEM), Reward-Penalty Adaptation (RPA), and Behavior Cloning Similarity (BCS) constraints. This integration significantly improves policy safety and exploration efficiency. Experimental evaluation in a high-fidelity simulation environment demonstrates that HI-PPO achieves an average trajectory error of 8.02 mm and a safety score of 0.862β€”matching expert-level performance and substantially outperforming conventional intervention-free RL baselines. The framework establishes a verifiable, clinically grounded paradigm for safe RDE navigation, advancing the translational readiness of robotic endoscopy.

Technology Category

Application Category

πŸ“ Abstract
With the increasing application of automated robotic digestive endoscopy (RDE), ensuring safe and efficient navigation in the unstructured and narrow digestive tract has become a critical challenge. Existing automated reinforcement learning navigation algorithms often result in potentially risky collisions due to the absence of essential human intervention, which significantly limits the safety and effectiveness of RDE in actual clinical practice. To address this limitation, we proposed a Human Intervention (HI)-based Proximal Policy Optimization (PPO) framework, dubbed HI-PPO, which incorporates expert knowledge to enhance RDE's safety. Specifically, HI-PPO combines Enhanced Exploration Mechanism (EEM), Reward-Penalty Adjustment (RPA), and Behavior Cloning Similarity (BCS) to address PPO's exploration inefficiencies for safe navigation in complex gastrointestinal environments. Comparative experiments were conducted on a simulation platform, and the results showed that HI-PPO achieved a mean ATE (Average Trajectory Error) of (8.02 ext{mm}) and a Security Score of (0.862), demonstrating performance comparable to human experts. The code will be publicly available once this paper is published.
Problem

Research questions and friction points this paper is trying to address.

Ensuring safe robotic endoscopy navigation in narrow digestive tracts
Reducing risky collisions by integrating human expert intervention
Improving reinforcement learning for complex gastrointestinal environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Human Intervention-based Proximal Policy Optimization
Enhanced Exploration Mechanism for safe navigation
Reward-Penalty Adjustment with Behavior Cloning
πŸ”Ž Similar Papers
No similar papers found.
Min Tan
Min Tan
Professor of School of Computer Science and Technology, Hangzhou Dianzi University
Machine LearningImage ProcessingMultimediaComputer Vision
Y
Yushun Tao
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 518055, Shenzhen, China; University of Chinese Academy of Sciences, 101400, Beijing, China
Boyun Zheng
Boyun Zheng
THE CHINESE UNIVERSITY OF HONG KONG
Medical image analysisDeep learning
G
GaoSheng Xie
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 518055, Shenzhen, China
L
Lijuan Feng
Department of Gastroenterology and Hepatology, Shenzhen University General Hospital, 518055, Shenzhen, China
Z
Zeyang Xia
School of Mechanical Engineering, Shanghai Jiao Tong University, 200240, Shanghai, China
J
Jing Xiong
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 518055, Shenzhen, China