V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties

📅 2025-12-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video generation models struggle to jointly model scene intrinsic properties—such as albedo, surface normals, material, and irradiance—and lack a closed-loop framework supporting physical interpretability and editable control. This work introduces the first end-to-end intrinsic-aware video editing framework, enabling inverse decomposition from video to intrinsic channels and photorealistic video synthesis and propagation conditioned on keyframes. Methodologically, we unify video inverse rendering, intrinsic-driven synthesis, and keyframe-conditioned editing for the first time; we design an interleaved, physics-guided conditional mechanism that enables intuitive, differentiable manipulation of arbitrary intrinsic modalities. By incorporating temporal consistency constraints and multimodal conditional modeling, our approach generates physically plausible, temporally coherent high-fidelity videos. Extensive experiments demonstrate significant improvements over state-of-the-art methods on object appearance editing and scene relighting tasks.

Technology Category

Application Category

📝 Abstract
Large-scale video generation models have shown remarkable potential in modeling photorealistic appearance and lighting interactions in real-world scenes. However, a closed-loop framework that jointly understands intrinsic scene properties (e.g., albedo, normal, material, and irradiance), leverages them for video synthesis, and supports editable intrinsic representations remains unexplored. We present V-RGBX, the first end-to-end framework for intrinsic-aware video editing. V-RGBX unifies three key capabilities: (1) video inverse rendering into intrinsic channels, (2) photorealistic video synthesis from these intrinsic representations, and (3) keyframe-based video editing conditioned on intrinsic channels. At the core of V-RGBX is an interleaved conditioning mechanism that enables intuitive, physically grounded video editing through user-selected keyframes, supporting flexible manipulation of any intrinsic modality. Extensive qualitative and quantitative results show that V-RGBX produces temporally consistent, photorealistic videos while propagating keyframe edits across sequences in a physically plausible manner. We demonstrate its effectiveness in diverse applications, including object appearance editing and scene-level relighting, surpassing the performance of prior methods.
Problem

Research questions and friction points this paper is trying to address.

Develops an end-to-end framework for intrinsic-aware video editing
Unifies video inverse rendering, synthesis, and keyframe-based editing
Enables physically grounded manipulation of intrinsic scene properties
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies video inverse rendering and synthesis with intrinsic channels
Enables keyframe-based editing through interleaved conditioning mechanism
Supports flexible manipulation of any intrinsic modality for consistency
🔎 Similar Papers
No similar papers found.