ReLumix: Extending Image Relighting to Video via Video Diffusion Models

📅 2025-09-28

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Video relighting suffers from limited editing flexibility and temporal inconsistency. To address this, we propose a two-stage framework: first enabling artist-driven, arbitrary image-level lighting edits on a single frame; then propagating these edits temporally using a fine-tuned Stable Video Diffusion (SVD) model, enhanced with gated cross-attention and motion-prior-guided temporal bootstrapping to ensure natural, temporally coherent relighting across the entire sequence. Our approach decouples lighting editing from temporal synthesis, thus supporting any off-the-shelf image relighting algorithm. A feature fusion mechanism effectively suppresses artifacts, while synthetic-data-based training ensures strong generalization to real-world videos. Experiments demonstrate that our method achieves superior visual fidelity, temporal consistency, and editing flexibility compared to state-of-the-art approaches, significantly improving the scalability and practicality of dynamic lighting control.

Technology Category

Application Category

📝 Abstract

Controlling illumination during video post-production is a crucial yet elusive goal in computational photography. Existing methods often lack flexibility, restricting users to certain relighting models. This paper introduces ReLumix, a novel framework that decouples the relighting algorithm from temporal synthesis, thereby enabling any image relighting technique to be seamlessly applied to video. Our approach reformulates video relighting into a simple yet effective two-stage process: (1) an artist relights a single reference frame using any preferred image-based technique (e.g., Diffusion Models, physics-based renderers); and (2) a fine-tuned stable video diffusion (SVD) model seamlessly propagates this target illumination throughout the sequence. To ensure temporal coherence and prevent artifacts, we introduce a gated cross-attention mechanism for smooth feature blending and a temporal bootstrapping strategy that harnesses SVD's powerful motion priors. Although trained on synthetic data, ReLumix shows competitive generalization to real-world videos. The method demonstrates significant improvements in visual fidelity, offering a scalable and versatile solution for dynamic lighting control.

Problem

Research questions and friction points this paper is trying to address.

Extending image relighting techniques to video sequences

Enabling flexible illumination control during video post-production

Ensuring temporal coherence when propagating lighting across frames

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage video relighting with image techniques

Gated cross-attention for temporal coherence

Temporal bootstrapping using stable video diffusion

🔎 Similar Papers

No similar papers found.

Apple

Cupertino, United States of America

AI Research Scientist, Computer Vision - Facebook Video Intelligence