🤖 AI Summary
Existing text-to-3D avatar generation methods predominantly rely on implicit representations (e.g., NeRF, SDF), hindering direct editing and animation in standard DCC software. This work introduces the first text-driven 3D avatar generation framework explicitly designed for editability: it operates on a template mesh, integrating face-level learnable Jacobian deformations and 2D diffusion priors to ensure identity preservation, controllable geometric detail, and multi-view consistency. We propose a novel local anisotropic vector field to drive deformation—guaranteeing vertex rotation invariance while enhancing expressiveness. Crucially, our method requires no explicit shape prior and natively supports inheritance of 3DMM parameters and blendshapes. The output is a topologically regular, high-fidelity explicit triangular mesh, directly importable into mainstream DCC tools for rigging, editing, and animation—significantly improving both generation efficiency and artistic controllability of digital human assets.
📝 Abstract
Current text-to-avatar methods often rely on implicit representations (e.g., NeRF, SDF, and DMTet), leading to 3D content that artists cannot easily edit and animate in graphics software. This paper introduces a novel framework for generating stylized head avatars from text guidance, which leverages locally learnable mesh deformation and 2D diffusion priors to achieve high-quality digital assets for attribute-preserving manipulation. Given a template mesh, our method represents mesh deformation with per-face Jacobians and adaptively modulates local deformation using a learnable vector field. This vector field enables anisotropic scaling while preserving the rotation of vertices, which can better express identity and geometric details. We employ landmark- and contour-based regularization terms to balance the expressiveness and plausibility of generated avatars from multiple views without relying on any specific shape prior. Our framework can generate realistic shapes and textures that can be further edited via text, while supporting seamless editing using the preserved attributes from the template mesh, such as 3DMM parameters, blendshapes, and UV coordinates. Extensive experiments demonstrate that our framework can generate diverse and expressive head avatars with high-quality meshes that artists can easily manipulate in graphics software, facilitating downstream applications such as efficient asset creation and animation with preserved attributes.