FastAvatar: Instant 3D Gaussian Splatting for Faces from Single Unconstrained Poses

📅 2025-08-25

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

This work addresses the problem of fast, high-fidelity reconstruction of a 3D Gaussian Splatting (3DGS) model from a single unconstrained-pose facial image, balancing identity preservation and real-time editability. We propose a pose-invariant encoder-decoder architecture that leverages a predefined 3D Gaussian template as structural prior and employs a feed-forward network to predict geometric and appearance residuals in an identity-specific latent space, enabling end-to-end reconstruction. To our knowledge, this is the first purely feed-forward framework supporting millisecond-scale (<10 ms) identity interpolation and fine-grained attribute editing—accelerating reconstruction over optimization-based methods by over three orders of magnitude. Experiments demonstrate significantly superior reconstruction quality compared to existing feed-forward approaches, while enabling real-time, high-fidelity avatar generation and interactive editing on consumer-grade hardware.

Technology Category

Application Category

📝 Abstract

We present FastAvatar, a pose-invariant, feed-forward framework that can generate a 3D Gaussian Splatting (3DGS) model from a single face image from an arbitrary pose in near-instant time (<10ms). FastAvatar uses a novel encoder-decoder neural network design to achieve both fast fitting and identity preservation regardless of input pose. First, FastAvatar constructs a 3DGS face ``template'' model from a training dataset of faces with multi-view captures. Second, FastAvatar encodes the input face image into an identity-specific and pose-invariant latent embedding, and decodes this embedding to predict residuals to the structural and appearance parameters of each Gaussian in the template 3DGS model. By only inferring residuals in a feed-forward fashion, model inference is fast and robust. FastAvatar significantly outperforms existing feed-forward face 3DGS methods (e.g., GAGAvatar) in reconstruction quality, and runs 1000x faster than per-face optimization methods (e.g., FlashAvatar, GaussianAvatars and GASP). In addition, FastAvatar's novel latent space design supports real-time identity interpolation and attribute editing which is not possible with any existing feed-forward 3DGS face generation framework. FastAvatar's combination of excellent reconstruction quality and speed expands the scope of 3DGS for photorealistic avatar applications in consumer and interactive systems.

Problem

Research questions and friction points this paper is trying to address.

Instant 3D Gaussian avatar generation from single unconstrained face pose

Achieving pose-invariant identity preservation with fast feed-forward inference

Overcoming slow optimization methods with real-time reconstruction quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Encoder-decoder network for fast Gaussian splatting

Template-based residual prediction for pose invariance

Latent space enables real-time interpolation and editing

🔎 Similar Papers

GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers