Monocular Facial Appearance Capture in the Wild

📅 2024-12-17
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses facial appearance reconstruction from monocular in-the-wild videos under uncontrolled conditions. We propose a lightweight, end-to-end differentiable rendering framework that jointly optimizes facial geometry, albedo, specular intensity, and roughness—solely from naturally head-rotating monocular video, without assuming uniform or simplified environmental illumination. Our method explicitly models visibility, occlusion, and spatially varying lighting. Key innovations include an occlusion-aware lighting inversion mechanism and implicit visibility estimation, enabling high-fidelity appearance reconstruction without multi-view inputs or studio constraints. Experiments demonstrate that the reconstructed geometry and material maps achieve fidelity comparable to professional multi-view studio captures, significantly reducing acquisition cost and hardware requirements. The framework thus provides a practical foundation for photorealistic virtual human animation and AR applications.

Technology Category

Application Category

📝 Abstract
We present a new method for reconstructing the appearance properties of human faces from a lightweight capture procedure in an unconstrained environment. Our method recovers the surface geometry, diffuse albedo, specular intensity and specular roughness from a monocular video containing a simple head rotation in-the-wild. Notably, we make no simplifying assumptions on the environment lighting, and we explicitly take visibility and occlusions into account. As a result, our method can produce facial appearance maps that approach the fidelity of studio-based multi-view captures, but with a far easier and cheaper procedure.
Problem

Research questions and friction points this paper is trying to address.

Recovering facial appearance from monocular video
Estimating geometry and reflectance without lighting assumptions
Handling occlusions for studio-quality capture in unconstrained environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monocular video facial reconstruction
Recovers geometry and appearance maps
No lighting assumptions with occlusion handling
🔎 Similar Papers
No similar papers found.
Y
Yingyan Xu
ETH Zürich, DisneyResearch | Studios
K
Kate Gadola
ETH Zürich
P
Prashanth Chandran
DisneyResearch | Studios
Sebastian Weiss
Sebastian Weiss
Disney Research Zürich
computer visualization and graphicsdeep learning
M
Markus H. Gross
ETH Zürich, DisneyResearch | Studios
G
Gaspard Zoss
DisneyResearch | Studios
Derek Bradley
Derek Bradley
DisneyResearch|Studios
Computer GraphicsComputer Vision