SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians

📅 2025-04-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of real-time, high-fidelity 3D head reconstruction from monocular images or videos—particularly under the constraint of scarce large-scale real 3D annotations. We propose a fully self-supervised learning framework. Our core innovation is the first introduction of differentiable 2D Gaussian ellipses as rendering primitives, directly anchored to 3DMM mesh vertices, replacing conventional differentiable mesh rendering to improve geometric consistency and enhance facial expression modeling. The method jointly optimizes 3DMM parameterization, Gaussian radiance field binding, differentiable Gaussian rendering, and photometric consistency. Evaluated on the NoW neutral-face benchmark and a newly constructed non-neutral expression benchmark, our approach achieves state-of-the-art geometric accuracy. Moreover, the reconstructed meshes demonstrate significantly superior performance over existing methods in downstream emotion classification tasks.

Technology Category

Application Category

📝 Abstract
Accurate, real-time 3D reconstruction of human heads from monocular images and videos underlies numerous visual applications. As 3D ground truth data is hard to come by at scale, previous methods have sought to learn from abundant 2D videos in a self-supervised manner. Typically, this involves the use of differentiable mesh rendering, which is effective but faces limitations. To improve on this, we propose SHeaP (Self-supervised Head Geometry Predictor Learned via 2D Gaussians). Given a source image, we predict a 3DMM mesh and a set of Gaussians that are rigged to this mesh. We then reanimate this rigged head avatar to match a target frame, and backpropagate photometric losses to both the 3DMM and Gaussian prediction networks. We find that using Gaussians for rendering substantially improves the effectiveness of this self-supervised approach. Training solely on 2D data, our method surpasses existing self-supervised approaches in geometric evaluations on the NoW benchmark for neutral faces and a new benchmark for non-neutral expressions. Our method also produces highly expressive meshes, outperforming state-of-the-art in emotion classification.
Problem

Research questions and friction points this paper is trying to address.

Real-time 3D head reconstruction from monocular images
Self-supervised learning without 3D ground truth data
Improving geometric accuracy and expression classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised learning via 2D Gaussians
3DMM mesh and Gaussian rigging prediction
Photometric loss backpropagation for refinement
🔎 Similar Papers
No similar papers found.
L
Liam Schoneveld
Woven by Toyota
Z
Zhe Chen
Woven by Toyota
D
Davide Davoli
Toyota Motor Europe NV/SA associated partner by contracted service
Jiapeng Tang
Jiapeng Tang
Technical University of Munich
3D ReconstructionComputer VisionGenerative Models
S
Saimon Terazawa
Woven by Toyota
Ko Nishino
Ko Nishino
Professor, Kyoto University
Computer VisionArtificial IntelligenceMachine Learning
M
Matthias Niessner
Technical University of Munich