ColonAdapter: Geometry Estimation Through Foundation Model Adaptation for Colonoscopy

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses inaccurate 3D geometric estimation in monocular colonoscopy images—caused by non-Lambertian reflectance, moving illumination, and large textureless regions. We propose a self-supervised fine-tuning framework to adapt general-purpose geometric foundation models for clinical settings, improving accuracy in depth prediction, camera pose estimation, and dense point-cloud reconstruction. Key innovations include a Detail Recovery Module (DRM), a geometric consistency loss, and a confidence-weighted photometric loss—enabling robust modeling of texture-poor regions and scale-consistent optimization without ground-truth camera intrinsics. Evaluated on both synthetic and real colonoscopy datasets, our method achieves state-of-the-art performance across monocular depth estimation, relative pose estimation, and 3D reconstruction tasks.

Technology Category

Application Category

📝 Abstract
Estimating 3D geometry from monocular colonoscopy images is challenging due to non-Lambertian surfaces, moving light sources, and large textureless regions. While recent 3D geometric foundation models eliminate the need for multi-stage pipelines, their performance deteriorates in clinical scenes. These models are primarily trained on natural scene datasets and struggle with specularity and homogeneous textures typical in colonoscopy, leading to inaccurate geometry estimation. In this paper, we present ColonAdapter, a self-supervised fine-tuning framework that adapts geometric foundation models for colonoscopy geometry estimation. Our method leverages pretrained geometric priors while tailoring them to clinical data. To improve performance in low-texture regions and ensure scale consistency, we introduce a Detail Restoration Module (DRM) and a geometry consistency loss. Furthermore, a confidence-weighted photometric loss enhances training stability in clinical environments. Experiments on both synthetic and real datasets demonstrate that our approach achieves state-of-the-art performance in camera pose estimation, monocular depth prediction, and dense 3D point map reconstruction, without requiring ground-truth intrinsic parameters.
Problem

Research questions and friction points this paper is trying to address.

Adapts geometric foundation models for colonoscopy geometry estimation
Addresses challenges of specularity and homogeneous textures in colonoscopy
Enhances accuracy in camera pose, depth prediction, and 3D reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised fine-tuning of geometric foundation models
Detail Restoration Module for low-texture region enhancement
Confidence-weighted photometric loss for training stability
Z
Zhiyi Jiang
School of Computer Science and Engineering, Southeast University, China
Yifu Wang
Yifu Wang
Tencent XR Vision Labs
Computer VisionRoboticsEvent-based VisionSLAMVisual Odometry
Xuelian Cheng
Xuelian Cheng
Monash University
3D VisionMedical ImagingMachine Learning
Z
Zongyuan Ge
Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia