UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting

📅 2025-06-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key challenges in single-image/video relighting—namely, inaccurate intrinsic property estimation, poor generalization, and error accumulation in two-stage pipelines—by proposing an end-to-end paradigm that jointly estimates albedo and synthesizes relit output. Departing from conventional decomposition-then-composition frameworks, our method implicitly models complex light-material interactions (e.g., shadows, specularities, transparency), enhancing the disentanglement of reflectance and illumination. Leveraging a video diffusion model, we train on synthetically generated multi-illumination data augmented with large-scale auto-annotated real-world videos. Crucially, temporal consistency is preserved throughout inference. Experiments demonstrate significant improvements in relighting quality across diverse scenes, challenging lighting conditions, and heterogeneous materials. Our approach achieves superior visual fidelity and cross-domain generalization compared to current state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
We address the challenge of relighting a single image or video, a task that demands precise scene intrinsic understanding and high-quality light transport synthesis. Existing end-to-end relighting models are often limited by the scarcity of paired multi-illumination data, restricting their ability to generalize across diverse scenes. Conversely, two-stage pipelines that combine inverse and forward rendering can mitigate data requirements but are susceptible to error accumulation and often fail to produce realistic outputs under complex lighting conditions or with sophisticated materials. In this work, we introduce a general-purpose approach that jointly estimates albedo and synthesizes relit outputs in a single pass, harnessing the generative capabilities of video diffusion models. This joint formulation enhances implicit scene comprehension and facilitates the creation of realistic lighting effects and intricate material interactions, such as shadows, reflections, and transparency. Trained on synthetic multi-illumination data and extensive automatically labeled real-world videos, our model demonstrates strong generalization across diverse domains and surpasses previous methods in both visual fidelity and temporal consistency.
Problem

Research questions and friction points this paper is trying to address.

Addressing video relighting with joint decomposition and synthesis
Overcoming limitations of paired multi-illumination data scarcity
Enhancing realism in complex lighting and material interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint albedo estimation and relighting synthesis
Utilizes video diffusion models for realism
Trained on synthetic and real-world videos
🔎 Similar Papers
No similar papers found.