RelightMaster: Precise Video Relighting with Multi-plane Light Images

📅 2025-11-09

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Existing text-to-video models struggle with fine-grained, physically plausible video relighting due to limited textual expressiveness for lighting, absence of lighting-aware pretraining, and scarcity of real-world multi-illumination video data. To address this, we introduce RelightVideo—the first multi-condition dynamic video dataset explicitly designed for relighting—and propose Multi-plane Light Image (MPLI), a novel visual prompt that explicitly encodes 3D light source position, intensity, and color, enabling multi-light support and zero-shot generalization to unseen illuminations. We further design a Light Image Adapter to inject MPLI features losslessly into pretrained video DiT models without catastrophic forgetting. Leveraging Unreal Engine-synthesized data, video VAE compression, and the DiT architecture, our method achieves high-fidelity, controllable relighting while preserving spatiotemporal content consistency. Experiments demonstrate significant improvements over baselines in shadow synthesis, cross-illumination transfer, and other relighting tasks.

Technology Category

Application Category

📝 Abstract

Recent advances in diffusion models enable high-quality video generation and editing, but precise relighting with consistent video contents, which is critical for shaping scene atmosphere and viewer attention, remains unexplored. Mainstream text-to-video (T2V) models lack fine-grained lighting control due to text's inherent limitation in describing lighting details and insufficient pre-training on lighting-related prompts. Additionally, constructing high-quality relighting training data is challenging, as real-world controllable lighting data is scarce. To address these issues, we propose RelightMaster, a novel framework for accurate and controllable video relighting. First, we build RelightVideo, the first dataset with identical dynamic content under varying precise lighting conditions based on the Unreal Engine. Then, we introduce Multi-plane Light Image (MPLI), a novel visual prompt inspired by Multi-Plane Image (MPI). MPLI models lighting via K depth-aligned planes, representing 3D light source positions, intensities, and colors while supporting multi-source scenarios and generalizing to unseen light setups. Third, we design a Light Image Adapter that seamlessly injects MPLI into pre-trained Video Diffusion Transformers (DiT): it compresses MPLI via a pre-trained Video VAE and injects latent light features into DiT blocks, leveraging the base model's generative prior without catastrophic forgetting. Experiments show that RelightMaster generates physically plausible lighting and shadows and preserves original scene content. Demos are available at https://wkbian.github.io/Projects/RelightMaster/.

Problem

Research questions and friction points this paper is trying to address.

Achieving precise video relighting with consistent content control

Overcoming text limitations for fine-grained lighting manipulation

Addressing scarcity of high-quality relighting training datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructed RelightVideo dataset with varied lighting conditions

Introduced Multi-plane Light Image for 3D lighting modeling

Designed Light Image Adapter for Video Diffusion Transformers

🔎 Similar Papers

Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era

2024-09-03Citations: 0