Virtually Being: Customizing Camera-Controllable Video Diffusion Models with Multi-View Performance Captures

πŸ“… 2025-10-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses three key challenges in virtual production using video diffusion models: preserving multi-view identity consistency, enabling 3D camera controllability, and supporting multi-subject customization. To this end, we propose a joint optimization framework that (1) leverages 4D Gaussian Splatting (4DGS) for voxelized performance relighting, augmented by video relighting-based data enhancement to diversify illumination conditions; (2) incorporates a noise fusion mechanism for efficient, inference-time multi-character composition; and (3) jointly fine-tunes open-source video diffusion models to enhance illumination adaptability and spatial layout control. Experiments demonstrate significant improvements over state-of-the-art methods in multi-view identity preservation, camera trajectory fidelity, and illumination robustness. The method also achieves superior video quality and personalized subject accuracy, establishing a new paradigm for high-fidelity, controllable video generation in virtual production.

Technology Category

Application Category

πŸ“ Abstract
We introduce a framework that enables both multi-view character consistency and 3D camera control in video diffusion models through a novel customization data pipeline. We train the character consistency component with recorded volumetric capture performances re-rendered with diverse camera trajectories via 4D Gaussian Splatting (4DGS), lighting variability obtained with a video relighting model. We fine-tune state-of-the-art open-source video diffusion models on this data to provide strong multi-view identity preservation, precise camera control, and lighting adaptability. Our framework also supports core capabilities for virtual production, including multi-subject generation using two approaches: joint training and noise blending, the latter enabling efficient composition of independently customized models at inference time; it also achieves scene and real-life video customization as well as control over motion and spatial layout during customization. Extensive experiments show improved video quality, higher personalization accuracy, and enhanced camera control and lighting adaptability, advancing the integration of video generation into virtual production. Our project page is available at: https://eyeline-labs.github.io/Virtually-Being.
Problem

Research questions and friction points this paper is trying to address.

Achieving multi-view character consistency in video generation
Enabling precise 3D camera control in video diffusion models
Supporting lighting adaptability and multi-subject virtual production
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-view character consistency via 4D Gaussian Splatting
Camera control and lighting adaptability in video diffusion
Multi-subject generation through joint training and noise blending
πŸ”Ž Similar Papers
No similar papers found.
Y
Yuancheng Xu
Eyeline Labs, United States of America
Wenqi Xian
Wenqi Xian
Netflix Eyeline Studios
Computer VisionComputer Graphics
L
Li Ma
Eyeline Labs, United States of America
Julien Philip
Julien Philip
Lead Research Scientist, Netflix Eyeline Studios
Computer GraphicsImage Based RenderingRelightingMachine LearningNeural Rendering
A
Ahmet Levent Taşel
Eyeline Labs, Canada
Y
Yiwei Zhao
Netflix, United States of America
R
Ryan Burgert
Eyeline Labs, United States of America
Mingming He
Mingming He
Netflix
Computer VisionComputer Graphics
O
Oliver Hermann
Eyeline Labs, Germany
O
Oliver Pilarski
Eyeline Labs, Germany
R
Rahul Garg
Netflix, United States of America
Paul Debevec
Paul Debevec
Professor of Computer Science, University of Southern California
computer graphicscomputer visioncultural heritagevisual effectsvirtual reality
N
Ning Yu
Eyeline Labs, United States of America