BronchOpt : Vision-Based Pose Optimization with Fine-Tuned Foundation Models for Accurate Bronchoscopy Navigation

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Bronchoscopy navigation suffers from significant localization errors due to respiratory motion, anatomical variability, and inaccuracies in CT-to-surface registration; existing vision-based methods exhibit poor generalizability and substantial alignment errors. This paper introduces the first frame-level 2D–3D visual–CT registration framework for bronchoscopic navigation: a modality-invariant encoder jointly embeds real RGB bronchoscopic video and CT-rendered depth maps, while differentiable rendering enables end-to-end pose optimization. We further present the first publicly available synthetic benchmark dataset, enabling cross-domain training and zero-shot transfer. Trained exclusively on synthetic data, our method achieves 2.65 mm translational and 0.19 rad rotational error on real patient data—outperforming prior approaches by a large margin—without domain adaptation, demonstrating robust cross-patient localization capability.

Technology Category

Application Category

📝 Abstract
Accurate intra-operative localization of the bronchoscope tip relative to patient anatomy remains challenging due to respiratory motion, anatomical variability, and CT-to-body divergence that cause deformation and misalignment between intra-operative views and pre-operative CT. Existing vision-based methods often fail to generalize across domains and patients, leading to residual alignment errors. This work establishes a generalizable foundation for bronchoscopy navigation through a robust vision-based framework and a new synthetic benchmark dataset that enables standardized and reproducible evaluation. We propose a vision-based pose optimization framework for frame-wise 2D-3D registration between intra-operative endoscopic views and pre-operative CT anatomy. A fine-tuned modality- and domain-invariant encoder enables direct similarity computation between real endoscopic RGB frames and CT-rendered depth maps, while a differentiable rendering module iteratively refines camera poses through depth consistency. To enhance reproducibility, we introduce the first public synthetic benchmark dataset for bronchoscopy navigation, addressing the lack of paired CT-endoscopy data. Trained exclusively on synthetic data distinct from the benchmark, our model achieves an average translational error of 2.65 mm and a rotational error of 0.19 rad, demonstrating accurate and stable localization. Qualitative results on real patient data further confirm strong cross-domain generalization, achieving consistent frame-wise 2D-3D alignment without domain-specific adaptation. Overall, the proposed framework achieves robust, domain-invariant localization through iterative vision-based optimization, while the new benchmark provides a foundation for standardized progress in vision-based bronchoscopy navigation.
Problem

Research questions and friction points this paper is trying to address.

Accurate bronchoscope tip localization during surgery using vision-based methods
Overcoming CT-to-body divergence and respiratory motion causing misalignment issues
Creating generalizable navigation across patients without domain-specific adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned modality-invariant encoder for CT-endoscopy similarity
Differentiable rendering module for iterative pose refinement
Synthetic benchmark dataset for reproducible bronchoscopy evaluation
🔎 Similar Papers
No similar papers found.
Hongchao Shu
Hongchao Shu
Johns Hopkins University
Digital Twins in MedicineComputer VisionAugmented Reality
Roger D. Soberanis-Mukul
Roger D. Soberanis-Mukul
Researcher, Advanced Robotics and Computationally Augmented Environments Lab, Johns Hopkins
deep learning for medial applicationsmedical image segmentationmedical image classification
J
Jiru Xu
Johns Hopkins University, Baltimore, Maryland, 21218, USA.
H
Hao Ding
Johns Hopkins University, Baltimore, Maryland, 21218, USA.
M
Morgan Ringel
Johnson & Johnson MedTech, Santa Clara, California, 95054, USA.
M
Mali Shen
Johnson & Johnson MedTech, Santa Clara, California, 95054, USA.
Saif Iftekar Sayed
Saif Iftekar Sayed
Johnson & Johnson MedTech, Santa Clara, California, 95054, USA.
H
Hedyeh Rafii-Tari
Johnson & Johnson MedTech, Santa Clara, California, 95054, USA.
Mathias Unberath
Mathias Unberath
Johns Hopkins University
Medical RoboticsComputer VisionAI/MLExtended RealityHCI