🤖 AI Summary
This work addresses the performance limitations of monocular depth estimation in colonoscopy caused by domain gaps between simulated and real images. Existing image translation methods often fail to simultaneously preserve structural consistency and photorealism, leading to geometric distortions and specular artifacts. To overcome this, the authors propose a novel Structure-to-Image paradigm that treats depth maps as the core generative prior rather than a posterior constraint. The approach introduces phase congruency analysis and cross-level structural constraints for the first time, enabling joint optimization of geometric structure and vascular texture details. The method achieves high-fidelity zero-shot domain adaptation, and when fine-tuned on a public phantom dataset, the resulting depth estimation model reduces RMSE by up to 44.18% compared to current state-of-the-art methods.
📝 Abstract
Monocular depth estimation (MDE) for colonoscopy is hampered by the domain gap between simulated and real-world images. Existing image-to-image translation methods, which use depth as a posterior constraint, often produce structural distortions and specular highlights by failing to balance realism with structure consistency. To address this, we propose a Structure-to-Image paradigm that transforms the depth map from a passive constraint into an active generative foundation. We are the first to introduce phase congruency to colonoscopic domain adaptation and design a cross-level structure constraint to co-optimize geometric structures and fine-grained details like vascular textures. In zero-shot evaluations conducted on a publicly available phantom dataset, the MDE model that was fine-tuned on our generated data achieved a maximum reduction of 44.18% in RMSE compared to competing methods. Our code is available at https://github.com/YyangJJuan/PC-S2I.git.