🤖 AI Summary
Indoor illumination estimation—recovering spatiotemporally continuous light fields from single images or videos—faces severe ill-posedness and poor zero-shot generalization to in-the-wild settings. To address this, we propose a diffusion-prior-driven MLP-based light field modeling framework: a fine-tuned pre-trained 2D diffusion model serves as a strong illumination prior; multi-chrome spherical probes enable multi-view illumination observation and occluded-region completion; and temporal consistency constraints are imposed to regularize video-based illumination estimation. Integrating insights from neural radiance fields with continuous light field representations, our method achieves the first high-fidelity, spatiotemporally coherent dynamic illumination reconstruction on in-the-wild videos—without scene-specific training. Extensive experiments demonstrate significant improvements over state-of-the-art baselines on both single-image and video illumination estimation tasks.
📝 Abstract
Indoor lighting estimation from a single image or video remains a challenge due to its highly ill-posed nature, especially when the lighting condition of the scene varies spatially and temporally. We propose a method that estimates from an input video a continuous light field describing the spatiotemporally varying lighting of the scene. We leverage 2D diffusion priors for optimizing such light field represented as a MLP. To enable zero-shot generalization to in-the-wild scenes, we fine-tune a pre-trained image diffusion model to predict lighting at multiple locations by jointly inpainting multiple chrome balls as light probes. We evaluate our method on indoor lighting estimation from a single image or video and show superior performance over compared baselines. Most importantly, we highlight results on spatiotemporally consistent lighting estimation from in-the-wild videos, which is rarely demonstrated in previous works.