🤖 AI Summary
Video relighting suffers from limited editing flexibility and temporal inconsistency. To address this, we propose a two-stage framework: first enabling artist-driven, arbitrary image-level lighting edits on a single frame; then propagating these edits temporally using a fine-tuned Stable Video Diffusion (SVD) model, enhanced with gated cross-attention and motion-prior-guided temporal bootstrapping to ensure natural, temporally coherent relighting across the entire sequence. Our approach decouples lighting editing from temporal synthesis, thus supporting any off-the-shelf image relighting algorithm. A feature fusion mechanism effectively suppresses artifacts, while synthetic-data-based training ensures strong generalization to real-world videos. Experiments demonstrate that our method achieves superior visual fidelity, temporal consistency, and editing flexibility compared to state-of-the-art approaches, significantly improving the scalability and practicality of dynamic lighting control.
📝 Abstract
Controlling illumination during video post-production is a crucial yet elusive goal in computational photography. Existing methods often lack flexibility, restricting users to certain relighting models. This paper introduces ReLumix, a novel framework that decouples the relighting algorithm from temporal synthesis, thereby enabling any image relighting technique to be seamlessly applied to video. Our approach reformulates video relighting into a simple yet effective two-stage process: (1) an artist relights a single reference frame using any preferred image-based technique (e.g., Diffusion Models, physics-based renderers); and (2) a fine-tuned stable video diffusion (SVD) model seamlessly propagates this target illumination throughout the sequence. To ensure temporal coherence and prevent artifacts, we introduce a gated cross-attention mechanism for smooth feature blending and a temporal bootstrapping strategy that harnesses SVD's powerful motion priors. Although trained on synthetic data, ReLumix shows competitive generalization to real-world videos. The method demonstrates significant improvements in visual fidelity, offering a scalable and versatile solution for dynamic lighting control.