Long-Short Term Agents for Pure-Vision Bronchoscopy Robotic Autonomy

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work proposes a purely vision-based autonomous navigation framework for robot-assisted bronchoscopy to address challenges such as limited field of view, dynamic artifacts, and reliance on external localization systems. By integrating preoperative CT scans with real-time endoscopic video, the approach leverages a hierarchical long–short-term agent architecture, a vision-based predictive world model, cross-modal alignment between CT and endoscopy, and low-latency motion control to enable long-range navigation without external sensors. In phantom, ex vivo porcine lung, and in vivo models, the system achieved navigation success rates of 100%, 80% (reaching up to the 8th-generation bronchi), and performance comparable to that of expert physicians, respectively. This study presents the first preclinical validation of sensor-free visual navigation for bronchoscopic interventions.

Technology Category

Application Category

📝 Abstract

Accurate intraoperative navigation is essential for robot-assisted endoluminal intervention, but remains difficult because of limited endoscopic field of view and dynamic artifacts. Existing navigation platforms often rely on external localization technologies, such as electromagnetic tracking or shape sensing, which increase hardware complexity and remain vulnerable to intraoperative anatomical mismatch. We present a vision-only autonomy framework that performs long-horizon bronchoscopic navigation using preoperative CT-derived virtual targets and live endoscopic video, without external tracking during navigation. The framework uses hierarchical long-short agents: a short-term reactive agent for continuous low-latency motion control, and a long-term strategic agent for decision support at anatomically ambiguous points. When their recommendations conflict, a world-model critic predicts future visual states for candidate actions and selects the action whose predicted state best matches the target view. We evaluated the system in a high-fidelity airway phantom, three ex vivo porcine lungs, and a live porcine model. The system reached all planned segmental targets in the phantom, maintained 80\% success to the eighth generation ex vivo, and achieved in vivo navigation performance comparable to the expert bronchoscopist. These results support the preclinical feasibility of sensor-free autonomous bronchoscopic navigation.

Problem

Research questions and friction points this paper is trying to address.

bronchoscopic navigation

vision-only autonomy

intraoperative navigation

external tracking

anatomical mismatch

Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-only navigation

hierarchical agents

world-model critic