🤖 AI Summary
This work addresses two key challenges in single-image lighting estimation: poor generalization from LDR inputs and distortion-prone, HDR-light-probe-infeasible chrome-ball inpainting using diffusion models. We propose DiffusionLight-Turbo—a novel framework that reformulates lighting estimation as a chrome-ball image inpainting task within a diffusion model. Its core contributions include: (1) the first formulation of lighting estimation as iterative diffusion-based chrome-ball repair, coupled with median fusion to construct a robust lighting prior; (2) Turbo LoRA, enabling single-step denoising for real-time inference—60× faster than baseline diffusion approaches; and (3) a Stable Diffusion XL–based architecture integrating Exposure LoRA to synthesize multi-exposure LDRs and fuse them into a high-fidelity HDR light probe, supporting LoRA-switching in a single end-to-end pass. Evaluated on real-world scenes, DiffusionLight-Turbo achieves significantly improved generalization and photorealistic lighting estimation, completing inference in ~30 seconds with negligible quality degradation.
📝 Abstract
We introduce a simple yet effective technique for estimating lighting from a single low-dynamic-range (LDR) image by reframing the task as a chrome ball inpainting problem. This approach leverages a pre-trained diffusion model, Stable Diffusion XL, to overcome the generalization failures of existing methods that rely on limited HDR panorama datasets. While conceptually simple, the task remains challenging because diffusion models often insert incorrect or inconsistent content and cannot readily generate chrome balls in HDR format. Our analysis reveals that the inpainting process is highly sensitive to the initial noise in the diffusion process, occasionally resulting in unrealistic outputs. To address this, we first introduce DiffusionLight, which uses iterative inpainting to compute a median chrome ball from multiple outputs to serve as a stable, low-frequency lighting prior that guides the generation of a high-quality final result. To generate high-dynamic-range (HDR) light probes, an Exposure LoRA is fine-tuned to create LDR images at multiple exposure values, which are then merged. While effective, DiffusionLight is time-intensive, requiring approximately 30 minutes per estimation. To reduce this overhead, we introduce DiffusionLight-Turbo, which reduces the runtime to about 30 seconds with minimal quality loss. This 60x speedup is achieved by training a Turbo LoRA to directly predict the averaged chrome balls from the iterative process. Inference is further streamlined into a single denoising pass using a LoRA swapping technique. Experimental results that show our method produces convincing light estimates across diverse settings and demonstrates superior generalization to in-the-wild scenarios. Our code is available at https://diffusionlight.github.io/turbo