Capture, Canonicalize, Splat: Zero-Shot 3D Gaussian Avatars from Unstructured Phone Images

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This paper addresses the challenge of reconstructing high-fidelity, identity-consistent 3D avatars from unconstrained mobile phone photographs—where existing methods suffer from geometric inconsistency, identity degradation, and loss of high-frequency details (e.g., wrinkles, fine hair). We propose a novel “Capture–Normalize–Splat” paradigm: (1) a generative normalization module maps arbitrary-view mobile images to a canonical pose without explicit pose annotations; (2) a Transformer-based 3D Gaussian splatting network is trained end-to-end on a large-scale real-person dome-captured dataset. Our method requires no multi-view registration or pose supervision, enabling zero-shot generation of three-quarter-body 3D Gaussian avatars. It significantly improves geometric consistency, identity fidelity, and realism of high-frequency surface details—including skin texture and hair—while maintaining robust visual realism and identity stability under uncontrolled capture conditions.

Technology Category

Application Category

📝 Abstract

We present a novel, zero-shot pipeline for creating hyperrealistic, identity-preserving 3D avatars from a few unstructured phone images. Existing methods face several challenges: single-view approaches suffer from geometric inconsistencies and hallucinations, degrading identity preservation, while models trained on synthetic data fail to capture high-frequency details like skin wrinkles and fine hair, limiting realism. Our method introduces two key contributions: (1) a generative canonicalization module that processes multiple unstructured views into a standardized, consistent representation, and (2) a transformer-based model trained on a new, large-scale dataset of high-fidelity Gaussian splatting avatars derived from dome captures of real people. This "Capture, Canonicalize, Splat" pipeline produces static quarter-body avatars with compelling realism and robust identity preservation from unstructured photos.

Problem

Research questions and friction points this paper is trying to address.

Creating 3D avatars from unstructured phone images without training

Solving geometric inconsistencies and identity degradation in single-view methods

Capturing high-frequency details like skin wrinkles for realistic avatars

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative canonicalization module processes multiple unstructured views

Transformer model trained on high-fidelity Gaussian splatting avatars

Pipeline creates realistic 3D avatars from unstructured phone images

🔎 Similar Papers

HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors