Capture, Canonicalize, Splat: Zero-Shot 3D Gaussian Avatars from Unstructured Phone Images

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of reconstructing high-fidelity, identity-consistent 3D avatars from unconstrained mobile phone photographs—where existing methods suffer from geometric inconsistency, identity degradation, and loss of high-frequency details (e.g., wrinkles, fine hair). We propose a novel “Capture–Normalize–Splat” paradigm: (1) a generative normalization module maps arbitrary-view mobile images to a canonical pose without explicit pose annotations; (2) a Transformer-based 3D Gaussian splatting network is trained end-to-end on a large-scale real-person dome-captured dataset. Our method requires no multi-view registration or pose supervision, enabling zero-shot generation of three-quarter-body 3D Gaussian avatars. It significantly improves geometric consistency, identity fidelity, and realism of high-frequency surface details—including skin texture and hair—while maintaining robust visual realism and identity stability under uncontrolled capture conditions.

Technology Category

Application Category

📝 Abstract
We present a novel, zero-shot pipeline for creating hyperrealistic, identity-preserving 3D avatars from a few unstructured phone images. Existing methods face several challenges: single-view approaches suffer from geometric inconsistencies and hallucinations, degrading identity preservation, while models trained on synthetic data fail to capture high-frequency details like skin wrinkles and fine hair, limiting realism. Our method introduces two key contributions: (1) a generative canonicalization module that processes multiple unstructured views into a standardized, consistent representation, and (2) a transformer-based model trained on a new, large-scale dataset of high-fidelity Gaussian splatting avatars derived from dome captures of real people. This "Capture, Canonicalize, Splat" pipeline produces static quarter-body avatars with compelling realism and robust identity preservation from unstructured photos.
Problem

Research questions and friction points this paper is trying to address.

Creating 3D avatars from unstructured phone images without training
Solving geometric inconsistencies and identity degradation in single-view methods
Capturing high-frequency details like skin wrinkles for realistic avatars
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative canonicalization module processes multiple unstructured views
Transformer model trained on high-fidelity Gaussian splatting avatars
Pipeline creates realistic 3D avatars from unstructured phone images
🔎 Similar Papers
No similar papers found.
E
Emanuel Garbin
Meta
G
Guy Adam
Meta
O
Oded Krams
Meta
Z
Zohar Barzelay
Meta
Eran Guendelman
Eran Guendelman
Meta
M
Michael Schwarz
Meta
M
Matteo Presutto
Meta
M
Moran Vatelmacher
Meta
Y
Yigal Shenkman
Meta
E
Eli Peker
Meta
Itai Druker
Itai Druker
Computer Vision, Meta
GraphicsVisionAvatars and Robotics
U
Uri Patish
Meta
Y
Yoav Blum
Meta
M
Max Bluvstein
Meta
Junxuan Li
Junxuan Li
Research Scientist, Codec Avatars Lab, Meta
Computer Vision
Rawal Khirodkar
Rawal Khirodkar
Research Scientist, Meta AI
Machine LearningComputer VisionDigital Human
Shunsuke Saito
Shunsuke Saito
Research Scientist, Meta Codec Avatars Lab
Digital HumansComputer VisionComputer Graphics