LightHeadEd: Relightable&Editable Head Avatars from a Smartphone

📅 2025-04-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the reliance of high-fidelity, relightable, and editable 3D head reconstruction on costly multi-camera systems, this paper proposes a lightweight solution based on a single smartphone—requiring only a polarizing filter, a point light source, and a darkroom environment for dynamic facial video capture. Methodologically, we introduce a novel polarization-based dichroic separation of skin’s diffuse and specular reflectance; propose a hybrid representation embedding 2D Gaussians in UV space; and design a neural analysis-synthesis framework that explicitly decouples geometry deformation from appearance. Key technical components include cross- and co-polarized video acquisition, parametric UV mapping, differentiable ray tracing, environment light map estimation, and a newly curated multi-subject facial motion dataset. Experiments demonstrate that our method achieves geometric and material fidelity comparable to Light Stage on real smartphone-captured data, enabling real-time rendering, arbitrary relighting, and pose- or expression-driven animation.

Technology Category

Application Category

📝 Abstract
Creating photorealistic, animatable, and relightable 3D head avatars traditionally requires expensive Lightstage with multiple calibrated cameras, making it inaccessible for widespread adoption. To bridge this gap, we present a novel, cost-effective approach for creating high-quality relightable head avatars using only a smartphone equipped with polaroid filters. Our approach involves simultaneously capturing cross-polarized and parallel-polarized video streams in a dark room with a single point-light source, separating the skin's diffuse and specular components during dynamic facial performances. We introduce a hybrid representation that embeds 2D Gaussians in the UV space of a parametric head model, facilitating efficient real-time rendering while preserving high-fidelity geometric details. Our learning-based neural analysis-by-synthesis pipeline decouples pose and expression-dependent geometrical offsets from appearance, decomposing the surface into albedo, normal, and specular UV texture maps, along with the environment maps. We collect a unique dataset of various subjects performing diverse facial expressions and head movements.
Problem

Research questions and friction points this paper is trying to address.

Creating affordable relightable head avatars from smartphones
Separating diffuse and specular components in facial performances
Decoupling pose and expression from appearance for realistic rendering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Smartphone with polaroid filters captures polarized videos
Hybrid UV-embedded 2D Gaussians enable real-time rendering
Neural pipeline decouples geometry and appearance for relighting
🔎 Similar Papers
P
Pranav Manu
IIIT Hyderabad
Astitva Srivastava
Astitva Srivastava
International Institute of Technology Hyderabad
2D/3D Computer VisionDeep Learning
A
Amit Raj
Google Research
Varun Jampani
Varun Jampani
Vice President of Research, Stability AI
Computer VisionMachine Learning
A
Avinash Sharma
IIT Jodhpur
P
P. J. Narayanan
IIIT Hyderabad