FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses geometric distortions and visual artifacts commonly observed in monocular portrait video generation under controllable camera trajectories, which often stem from scale ambiguity or inaccuracies in 3D reconstruction. To overcome these limitations, the authors propose a face-aware, scale-aware camera representation that enables high-quality, temporally coherent, and controllable dynamic camera motion without relying on explicit 3D priors. The approach introduces a deterministic scale-aware camera conditioning mechanism and integrates synthetic camera motions with a multi-shot stitching strategy. It is trained within a large-scale video generation framework that jointly leverages multi-view studio data and in-the-wild monocular videos. Experiments demonstrate significant improvements over existing methods in camera controllability, visual fidelity, identity preservation, and motion authenticity on both the Ava-256 benchmark and diverse in-the-wild video datasets.

Technology Category

Application Category

📝 Abstract

We introduce FaceCam, a system that generates video under customizable camera trajectories for monocular human portrait video input. Recent camera control approaches based on large video-generation models have shown promising progress but often exhibit geometric distortions and visual artifacts on portrait videos due to scale-ambiguous camera representations or 3D reconstruction errors. To overcome these limitations, we propose a face-tailored scale-aware representation for camera transformations that provides deterministic conditioning without relying on 3D priors. We train a video generation model on both multi-view studio captures and in-the-wild monocular videos, and introduce two camera-control data generation strategies: synthetic camera motion and multi-shot stitching, to exploit stationary training cameras while generalizing to dynamic, continuous camera trajectories at inference time. Experiments on Ava-256 dataset and diverse in-the-wild videos demonstrate that FaceCam achieves superior performance in camera controllability, visual quality, identity and motion preservation.

Problem

Research questions and friction points this paper is trying to address.

camera control

portrait video

geometric distortion

visual artifacts

scale ambiguity

Innovation

Methods, ideas, or system contributions that make the work stand out.

scale-aware conditioning

camera control

portrait video generation

monocular video

3D-prior-free

🔎 Similar Papers

No similar papers found.