🤖 AI Summary
Current general-purpose 3D head prior models jointly represent face and hair, ignoring their inherent composability—leading to entangled representations, poor disentanglement, and limited support for face/hair swapping or few-shot editing, especially under data scarcity. This work proposes the first general prior framework that explicitly models hair composability: it synthesizes hair-free data to construct paired training sets, leverages diffusion priors to estimate hair-free geometry and texture, and employs 3D Gaussians with a disentangled latent-space learning strategy to model face and hair as independent generative manifolds. The method enables identity-consistent, cross-avatar face–hair swapping under limited supervision, supports monocular few-shot fine-tuning, and significantly enhances controllability, editability, and deployment flexibility in 3D avatar generation.
📝 Abstract
We present a universal prior model for 3D head avatars with explicit hair compositionality. Existing approaches to build generalizable priors for 3D head avatars often adopt a holistic modeling approach, treating the face and hair as an inseparable entity. This overlooks the inherent compositionality of the human head, making it difficult for the model to naturally disentangle face and hair representations, especially when the dataset is limited. Furthermore, such holistic models struggle to support applications like 3D face and hairstyle swapping in a flexible and controllable manner. To address these challenges, we introduce a prior model that explicitly accounts for the compositionality of face and hair, learning their latent spaces separately. A key enabler of this approach is our synthetic hairless data creation pipeline, which removes hair from studio-captured datasets using estimated hairless geometry and texture derived from a diffusion prior. By leveraging a paired dataset of hair and hairless captures, we train disentangled prior models for face and hair, incorporating compositionality as an inductive bias to facilitate effective separation. Our model's inherent compositionality enables seamless transfer of face and hair components between avatars while preserving identity. Additionally, we demonstrate that our model can be fine-tuned in a few-shot manner using monocular captures to create high-fidelity, hair-compositional 3D head avatars for unseen subjects. These capabilities highlight the practical applicability of our approach in real-world scenarios, paving the way for flexible and expressive 3D avatar generation.