🤖 AI Summary
Existing methods struggle to efficiently reconstruct high-fidelity, editable 3D head avatars from a small number of casually captured images, often requiring densely sampled multi-view inputs or time-consuming optimization. This work proposes the first explicitly decoupled 3D Gaussian representation for faces and hairstyles: planar Gaussians model the facial geometry, while strand-based Gaussians represent hair. A novel aggregation Transformer backbone is introduced to learn geometry-aware cross-view priors and enforce structural consistency between head and hair from sparse multi-view imagery. The method achieves state-of-the-art reconstruction quality in just a few minutes and supports real-time rendering, animation-driven manipulation, hairstyle transfer, and stylized editing, significantly enhancing both the efficiency and flexibility of digital avatar creation.
📝 Abstract
We present FHAvatar, a novel framework for reconstructing 3D Gaussian avatars with composable face and hair components from an arbitrary number of views. Unlike previous approaches that couple facial and hair representations within a unified modeling process, we explicitly decouple two components in texture space by representing the face with planar Gaussians and the hair with strand-based Gaussians. To overcome the limitations of existing methods that rely on dense multi-view captures or costly per-identity optimization, we propose an aggregated transformer backbone to learn geometry-aware cross-view priors and head-hair structural coherence from multi-view datasets, enabling effective and efficient feature extraction and fusion from few casual captures. Extensive quantitative and qualitative experiments demonstrate that FHAvatar achieves state-of-the-art reconstruction quality from only a few observations of new identities within minutes, while supporting real-time animation, convenient hairstyle transfer, and stylized editing, broadening the accessibility and applicability of digital avatar creation.