Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset

📅 2025-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Video portrait relighting faces a fundamental trade-off between photorealism and temporal stability. Existing approaches rely heavily on high-quality paired multi-illumination video data, severely limiting their generalizability and practical applicability. To address this, we propose the first conditional video diffusion model specifically designed for portrait video relighting. Our method introduces a novel dynamic illumination embedding mechanism and adopts a hybrid training paradigm combining static One-Light-At-a-Time (OLAT) data with uncurated single-illumination in-the-wild videos—eliminating the need for paired multi-illumination sequences. Leveraging spatiotemporal consistency losses and lightweight conditional adaptation of a pre-trained diffusion backbone, our framework enables end-to-end relighting under arbitrary target lighting conditions. Extensive experiments demonstrate state-of-the-art performance in both photorealism and temporal coherence, significantly outperforming existing methods.

Technology Category

Application Category

📝 Abstract
Video portrait relighting remains challenging because the results need to be both photorealistic and temporally stable. This typically requires a strong model design that can capture complex facial reflections as well as intensive training on a high-quality paired video dataset, such as dynamic one-light-at-a-time (OLAT). In this work, we introduce Lux Post Facto, a novel portrait video relighting method that produces both photorealistic and temporally consistent lighting effects. From the model side, we design a new conditional video diffusion model built upon state-of-the-art pre-trained video diffusion model, alongside a new lighting injection mechanism to enable precise control. This way we leverage strong spatial and temporal generative capability to generate plausible solutions to the ill-posed relighting problem. Our technique uses a hybrid dataset consisting of static expression OLAT data and in-the-wild portrait performance videos to jointly learn relighting and temporal modeling. This avoids the need to acquire paired video data in different lighting conditions. Our extensive experiments show that our model produces state-of-the-art results both in terms of photorealism and temporal consistency.
Problem

Research questions and friction points this paper is trying to address.

Addresses video portrait relighting challenges
Develops a novel conditional video diffusion model
Utilizes a hybrid dataset for training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conditional video diffusion model for relighting
Hybrid dataset combining OLAT and in-the-wild videos
Lighting injection mechanism for precise control
🔎 Similar Papers
No similar papers found.