🤖 AI Summary
To address high-frequency detail distortion and irradiance inconsistency in dynamic-scene HDR illumination estimation, this paper proposes a spatiotemporal HDR lighting estimation framework based on diffusion models. Methodologically, it replaces conventional single-depth-map conditioning with geometry-aware spatial encoding for precise localization; introduces, for the first time, a diffusion-model-driven multi-exposure spherical rendering and differentiable fusion mechanism to reconstruct HDRI end-to-end; and constructs the first indoor-outdoor spatiotemporal light probe dataset for diffusion model fine-tuning. Experiments demonstrate state-of-the-art performance in spatial control accuracy and lighting fidelity, significantly improving recovery of specular and diffuse high-frequency textures while ensuring global irradiance consistency.
📝 Abstract
We present Lighting in Motion (LiMo), a diffusion-based approach to spatiotemporal lighting estimation. LiMo targets both realistic high-frequency detail prediction and accurate illuminance estimation. To account for both, we propose generating a set of mirrored and diffuse spheres at different exposures, based on their 3D positions in the input. Making use of diffusion priors, we fine-tune powerful existing diffusion models on a large-scale customized dataset of indoor and outdoor scenes, paired with spatiotemporal light probes. For accurate spatial conditioning, we demonstrate that depth alone is insufficient and we introduce a new geometric condition to provide the relative position of the scene to the target 3D position. Finally, we combine diffuse and mirror predictions at different exposures into a single HDRI map leveraging differentiable rendering. We thoroughly evaluate our method and design choices to establish LiMo as state-of-the-art for both spatial control and prediction accuracy.