PPS-Ctrl: Controllable Sim-to-Real Translation for Colonoscopy Depth Estimation

📅 2025-04-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In colonoscopy depth estimation, acquiring accurate ground-truth depth is challenging, and existing sim-to-real image translation methods often suffer from structural distortion and unrealistic texture. To address these issues, this paper proposes a lighting-aware, structure-constrained controllable image translation method. We introduce per-pixel shading (PPS) maps—derived from illumination modeling—as geometric priors into ControlNet conditioning, offering more robust structural guidance than conventional depth maps. Our approach synergistically integrates Stable Diffusion to jointly optimize structural fidelity and textural realism. The resulting end-to-end differentiable sim-to-real translation framework significantly outperforms MI-CycleGAN on colonoscopy depth estimation: it achieves higher structural consistency and more photorealistic texture in translated images, reducing depth prediction error by 18.7%. Code is publicly available.

Technology Category

Application Category

📝 Abstract
Accurate depth estimation enhances endoscopy navigation and diagnostics, but obtaining ground-truth depth in clinical settings is challenging. Synthetic datasets are often used for training, yet the domain gap limits generalization to real data. We propose a novel image-to-image translation framework that preserves structure while generating realistic textures from clinical data. Our key innovation integrates Stable Diffusion with ControlNet, conditioned on a latent representation extracted from a Per-Pixel Shading (PPS) map. PPS captures surface lighting effects, providing a stronger structural constraint than depth maps. Experiments show our approach produces more realistic translations and improves depth estimation over GAN-based MI-CycleGAN. Our code is publicly accessible at https://github.com/anaxqx/PPS-Ctrl.
Problem

Research questions and friction points this paper is trying to address.

Bridging domain gap in colonoscopy depth estimation
Generating realistic textures from synthetic data
Improving depth estimation accuracy with PPS maps
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Stable Diffusion with ControlNet
Integrates Per-Pixel Shading maps
Generates realistic textures from clinical data
🔎 Similar Papers
No similar papers found.
X
Xinqi Xiong
University of North Carolina at Chapel Hill, Chapel Hill, USA
A
Andrea Dunn Beltran
University of North Carolina at Chapel Hill, Chapel Hill, USA
J
Jun Myeong Choi
University of North Carolina at Chapel Hill, Chapel Hill, USA
Marc Niethammer
Marc Niethammer
Professor of Computer Science, UC San Diego
medical image analysismachine learningimage registration
Roni Sengupta
Roni Sengupta
Assistant Professor, University of North Carolina at Chapel Hill
Computer VisionComputer GraphicsComputational Photography