LAM: Large Avatar Model for One-shot Animatable Gaussian Head

📅 2025-02-25

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

This work addresses the challenge of reconstructing animatable 3D Gaussian head avatars from a single input image—without requiring video sequences for training or auxiliary animation/rendering networks. The proposed method introduces a multi-scale Transformer architecture grounded in FLAME-based canonical point queries, enabling direct regression of Gaussian parameters in canonical space from a single image. Geometrically consistent skeletal animation is achieved by integrating linear blend skinning (LBS) with learned deformation correction. Furthermore, differentiable Gaussian splatting enables real-time, cross-platform (including mobile) re-rendering. Compared to state-of-the-art approaches, our method achieves superior reconstruction fidelity, generalization across diverse identities and expressions, and significantly improved inference efficiency on edge devices. Quantitative and qualitative evaluations on standard benchmarks demonstrate substantial gains in both real-time animation accuracy and rendering performance.

Technology Category

Application Category

📝 Abstract

We present LAM, an innovative Large Avatar Model for animatable Gaussian head reconstruction from a single image. Unlike previous methods that require extensive training on captured video sequences or rely on auxiliary neural networks for animation and rendering during inference, our approach generates Gaussian heads that are immediately animatable and renderable. Specifically, LAM creates an animatable Gaussian head in a single forward pass, enabling reenactment and rendering without additional networks or post-processing steps. This capability allows for seamless integration into existing rendering pipelines, ensuring real-time animation and rendering across a wide range of platforms, including mobile phones. The centerpiece of our framework is the canonical Gaussian attributes generator, which utilizes FLAME canonical points as queries. These points interact with multi-scale image features through a Transformer to accurately predict Gaussian attributes in the canonical space. The reconstructed canonical Gaussian avatar can then be animated utilizing standard linear blend skinning (LBS) with corrective blendshapes as the FLAME model did and rendered in real-time on various platforms. Our experimental results demonstrate that LAM outperforms state-of-the-art methods on existing benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Single-image animatable Gaussian head reconstruction

Real-time animation without auxiliary networks

Canonical Gaussian attributes using Transformer

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single forward pass generation

Canonical Gaussian attributes generator

Real-time animation and rendering

🔎 Similar Papers

HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors