🤖 AI Summary
This work addresses facial appearance reconstruction from monocular in-the-wild videos under uncontrolled conditions. We propose a lightweight, end-to-end differentiable rendering framework that jointly optimizes facial geometry, albedo, specular intensity, and roughness—solely from naturally head-rotating monocular video, without assuming uniform or simplified environmental illumination. Our method explicitly models visibility, occlusion, and spatially varying lighting. Key innovations include an occlusion-aware lighting inversion mechanism and implicit visibility estimation, enabling high-fidelity appearance reconstruction without multi-view inputs or studio constraints. Experiments demonstrate that the reconstructed geometry and material maps achieve fidelity comparable to professional multi-view studio captures, significantly reducing acquisition cost and hardware requirements. The framework thus provides a practical foundation for photorealistic virtual human animation and AR applications.
📝 Abstract
We present a new method for reconstructing the appearance properties of human faces from a lightweight capture procedure in an unconstrained environment. Our method recovers the surface geometry, diffuse albedo, specular intensity and specular roughness from a monocular video containing a simple head rotation in-the-wild. Notably, we make no simplifying assumptions on the environment lighting, and we explicitly take visibility and occlusions into account. As a result, our method can produce facial appearance maps that approach the fidelity of studio-based multi-view captures, but with a far easier and cheaper procedure.