Yesnt: Are Diffusion Relighting Models Ready for Capture Stage Compositing? A Hybrid Alternative to Bridge the Gap

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Temporal instability and limited production readiness hinder practical volumetric video relighting. Method: We propose a hybrid framework integrating diffusion priors with physically based rendering. Specifically: (1) a diffusion model extracts per-frame intrinsic decomposition and material priors; (2) optical-flow-guided temporal consistency regularization suppresses inter-frame flickering; and (3) a Gaussian opacity field is constructed to generate renderable mesh proxies enabling indirect illumination modeling. The framework enables physics-informed relighting synthesis within standard graphics pipelines. Results: Experiments on both real and synthetic data demonstrate that our method significantly improves temporal stability over pure diffusion-based approaches for long sequences, supports extended video durations, and achieves a superior balance between visual fidelity, physical plausibility, and generation quality—marking a critical step toward production-grade volumetric video relighting.

Technology Category

Application Category

📝 Abstract

Volumetric video relighting is essential for bringing captured performances into virtual worlds, but current approaches struggle to deliver temporally stable, production-ready results. Diffusion-based intrinsic decomposition methods show promise for single frames, yet suffer from stochastic noise and instability when extended to sequences, while video diffusion models remain constrained by memory and scale. We propose a hybrid relighting framework that combines diffusion-derived material priors with temporal regularization and physically motivated rendering. Our method aggregates multiple stochastic estimates of per-frame material properties into temporally consistent shading components, using optical-flow-guided regularization. For indirect effects such as shadows and reflections, we extract a mesh proxy from Gaussian Opacity Fields and render it within a standard graphics pipeline. Experiments on real and synthetic captures show that this hybrid strategy achieves substantially more stable relighting across sequences than diffusion-only baselines, while scaling beyond the clip lengths feasible for video diffusion. These results indicate that hybrid approaches, which balance learned priors with physically grounded constraints, are a practical step toward production-ready volumetric video relighting.

Problem

Research questions and friction points this paper is trying to address.

Achieving temporally stable volumetric video relighting for virtual worlds

Overcoming stochastic noise in diffusion-based intrinsic decomposition methods

Bridging memory and scale limitations of video diffusion models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid framework combines diffusion priors with temporal regularization

Optical flow guides consistent shading from stochastic estimates

Mesh proxy from Gaussian fields enables physical rendering

🔎 Similar Papers

MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors