FlexAvatar: Flexible Large Reconstruction Model for Animatable Gaussian Head Avatars with Detailed Deformation

📅 2025-12-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem of high-fidelity, animatable 3D head reconstruction from a single or sparse set of input images, without requiring camera poses or facial expression labels. Methodologically: (1) we introduce structured head query tokens as canonical-space anchors to decouple geometry from identity; (2) we design a UV-space-conditioned UNet decoder enabling lightweight, real-time expression-driven deformation; and (3) we propose a data distribution reweighting scheme with rare-expression augmentation, coupled with a 10-second identity fine-tuning module. Under fully unsupervised settings, our approach significantly improves 3D consistency and dynamic detail fidelity. It achieves state-of-the-art performance in extreme identity preservation, cross-expression generalization, and real-time rendering quality—outperforming all existing methods across these dimensions.

Technology Category

Application Category

📝 Abstract
We present FlexAvatar, a flexible large reconstruction model for high-fidelity 3D head avatars with detailed dynamic deformation from single or sparse images, without requiring camera poses or expression labels. It leverages a transformer-based reconstruction model with structured head query tokens as canonical anchor to aggregate flexible input-number-agnostic, camera-pose-free and expression-free inputs into a robust canonical 3D representation. For detailed dynamic deformation, we introduce a lightweight UNet decoder conditioned on UV-space position maps, which can produce detailed expression-dependent deformations in real time. To better capture rare but critical expressions like wrinkles and bared teeth, we also adopt a data distribution adjustment strategy during training to balance the distribution of these expressions in the training set. Moreover, a lightweight 10-second refinement can further enhances identity-specific details in extreme identities without affecting deformation quality. Extensive experiments demonstrate that our FlexAvatar achieves superior 3D consistency, detailed dynamic realism compared with previous methods, providing a practical solution for animatable 3D avatar creation.
Problem

Research questions and friction points this paper is trying to address.

Creates high-fidelity 3D head avatars from sparse images without camera poses or expression labels
Generates detailed expression-dependent deformations in real time using a lightweight UV-space decoder
Improves capture of rare expressions like wrinkles through data distribution adjustment and refinement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based reconstruction model with canonical anchor tokens
Lightweight UNet decoder for real-time UV-space deformation
Data distribution adjustment strategy for rare expression training
🔎 Similar Papers
No similar papers found.