Endo-FASt3r: Endoscopic Foundation model Adaptation for Structure from motion

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low accuracy and poor generalizability of joint monocular endoscopic depth and camera pose estimation in robot-assisted minimally invasive surgery, this paper proposes the first self-supervised, video-based framework for simultaneous estimation. Our method innovatively integrates an enhanced Reloc3rX pose foundation model with DoMoRA—a high-rank adaptation technique—overcoming limitations of conventional low-rank fine-tuning to enable end-to-end co-optimization of depth and pose. Leveraging monocular self-supervised learning, it requires no ground-truth depth or pose annotations. On the SCARED dataset, our approach reduces pose estimation error by 10% and depth error by 2%. Furthermore, cross-domain evaluation on Hamlyn and StereoMIS demonstrates strong generalizability, significantly improving intraoperative 3D visualization quality.

Technology Category

Application Category

📝 Abstract
Accurate depth and camera pose estimation is essential for achieving high-quality 3D visualisations in robotic-assisted surgery. Despite recent advancements in foundation model adaptation to monocular depth estimation of endoscopic scenes via self-supervised learning (SSL), no prior work has explored their use for pose estimation. These methods rely on low rank-based adaptation approaches, which constrain model updates to a low-rank space. We propose Endo-FASt3r, the first monocular SSL depth and pose estimation framework that uses foundation models for both tasks. We extend the Reloc3r relative pose estimation foundation model by designing Reloc3rX, introducing modifications necessary for convergence in SSL. We also present DoMoRA, a novel adaptation technique that enables higher-rank updates and faster convergence. Experiments on the SCARED dataset show that Endo-FASt3r achieves a substantial $10%$ improvement in pose estimation and a $2%$ improvement in depth estimation over prior work. Similar performance gains on the Hamlyn and StereoMIS datasets reinforce the generalisability of Endo-FASt3r across different datasets.
Problem

Research questions and friction points this paper is trying to address.

Accurate depth and pose estimation in robotic-assisted surgery.
Adapting foundation models for monocular SSL depth and pose estimation.
Improving convergence and performance with novel adaptation techniques.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monocular SSL depth and pose estimation framework
Reloc3rX for SSL convergence in pose estimation
DoMoRA for higher-rank updates and faster convergence
🔎 Similar Papers
No similar papers found.
M
Mona Sheikh Zeinoddin
1UCL Hawkes Institute, University College London, UK; 2Institute of Health Informatics, University College London, UK
M
Mobarakol Islam
1UCL Hawkes Institute, University College London, UK; 4Dept of Medical Physics & Biomedical Engineering, University College London, UK
Zafer Tandogdu
Zafer Tandogdu
University College London
Greg Shaw
Greg Shaw
3Dept of Urology, University College London Hospitals, UK
M
Mathew J. Clarkson
1UCL Hawkes Institute, University College London, UK; 4Dept of Medical Physics & Biomedical Engineering, University College London, UK
Evangelos Mazomenos
Evangelos Mazomenos
Associate Professor, University College London
Computer-Assisted InterventionsSurgical Data ScienceSurgical RoboticsBiomedical Signal Process
Danail Stoyanov
Danail Stoyanov
Professor of Robot Vision, University College London
Surgical VisionSurgical AISurgical RoboticsComputer Assisted InterventionsSurgical Data Science