🤖 AI Summary
This work addresses the problem of fast, high-fidelity reconstruction of a 3D Gaussian Splatting (3DGS) model from a single unconstrained-pose facial image, balancing identity preservation and real-time editability. We propose a pose-invariant encoder-decoder architecture that leverages a predefined 3D Gaussian template as structural prior and employs a feed-forward network to predict geometric and appearance residuals in an identity-specific latent space, enabling end-to-end reconstruction. To our knowledge, this is the first purely feed-forward framework supporting millisecond-scale (<10 ms) identity interpolation and fine-grained attribute editing—accelerating reconstruction over optimization-based methods by over three orders of magnitude. Experiments demonstrate significantly superior reconstruction quality compared to existing feed-forward approaches, while enabling real-time, high-fidelity avatar generation and interactive editing on consumer-grade hardware.
📝 Abstract
We present FastAvatar, a pose-invariant, feed-forward framework that can generate a 3D Gaussian Splatting (3DGS) model from a single face image from an arbitrary pose in near-instant time (<10ms). FastAvatar uses a novel encoder-decoder neural network design to achieve both fast fitting and identity preservation regardless of input pose. First, FastAvatar constructs a 3DGS face ``template'' model from a training dataset of faces with multi-view captures. Second, FastAvatar encodes the input face image into an identity-specific and pose-invariant latent embedding, and decodes this embedding to predict residuals to the structural and appearance parameters of each Gaussian in the template 3DGS model. By only inferring residuals in a feed-forward fashion, model inference is fast and robust. FastAvatar significantly outperforms existing feed-forward face 3DGS methods (e.g., GAGAvatar) in reconstruction quality, and runs 1000x faster than per-face optimization methods (e.g., FlashAvatar, GaussianAvatars and GASP). In addition, FastAvatar's novel latent space design supports real-time identity interpolation and attribute editing which is not possible with any existing feed-forward 3DGS face generation framework. FastAvatar's combination of excellent reconstruction quality and speed expands the scope of 3DGS for photorealistic avatar applications in consumer and interactive systems.