PPS-Ctrl: Controllable Sim-to-Real Translation for Colonoscopy Depth Estimation

📅 2025-04-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In colonoscopy depth estimation, acquiring accurate ground-truth depth is challenging, and existing sim-to-real image translation methods often suffer from structural distortion and unrealistic texture. To address these issues, this paper proposes a lighting-aware, structure-constrained controllable image translation method. We introduce per-pixel shading (PPS) maps—derived from illumination modeling—as geometric priors into ControlNet conditioning, offering more robust structural guidance than conventional depth maps. Our approach synergistically integrates Stable Diffusion to jointly optimize structural fidelity and textural realism. The resulting end-to-end differentiable sim-to-real translation framework significantly outperforms MI-CycleGAN on colonoscopy depth estimation: it achieves higher structural consistency and more photorealistic texture in translated images, reducing depth prediction error by 18.7%. Code is publicly available.

Technology Category

Application Category

📝 Abstract

Accurate depth estimation enhances endoscopy navigation and diagnostics, but obtaining ground-truth depth in clinical settings is challenging. Synthetic datasets are often used for training, yet the domain gap limits generalization to real data. We propose a novel image-to-image translation framework that preserves structure while generating realistic textures from clinical data. Our key innovation integrates Stable Diffusion with ControlNet, conditioned on a latent representation extracted from a Per-Pixel Shading (PPS) map. PPS captures surface lighting effects, providing a stronger structural constraint than depth maps. Experiments show our approach produces more realistic translations and improves depth estimation over GAN-based MI-CycleGAN. Our code is publicly accessible at https://github.com/anaxqx/PPS-Ctrl.

Problem

Research questions and friction points this paper is trying to address.

Bridging domain gap in colonoscopy depth estimation

Generating realistic textures from synthetic data

Improving depth estimation accuracy with PPS maps

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Stable Diffusion with ControlNet

Integrates Per-Pixel Shading maps

Generates realistic textures from clinical data

🔎 Similar Papers

No similar papers found.

Authors to Follow