🤖 AI Summary
Existing diffusion-based methods for 3D human avatar generation face three key bottlenecks: (1) complex modeling of anatomical structure and pose, (2) scarcity of high-quality, semantically rich annotations, and (3) absence of production-ready skeletal rigging in outputs. To address these, we propose LLM-Augmented 3D Avatar Generation—a novel framework that integrates large language models’ commonsense reasoning into the 3D human generation pipeline. It synergistically combines multimodal alignment, procedural human modeling, and automated geometry-rig co-verification to establish a human-in-the-loop “generate → verify → refine”闭环. The method supports both text and image conditioning, enabling fine-grained controllability over body and facial geometry. Experiments demonstrate state-of-the-art performance on text/image-to-avatar generation, yielding avatars with high-fidelity geometry, semantically consistent details, and production-ready skeletal rigs—significantly accelerating digital content creation.
📝 Abstract
We introduce AvatarForge, a framework for generating animatable 3D human avatars from text or image inputs using AI-driven procedural generation. While diffusion-based methods have made strides in general 3D object generation, they struggle with high-quality, customizable human avatars due to the complexity and diversity of human body shapes, poses, exacerbated by the scarcity of high-quality data. Additionally, animating these avatars remains a significant challenge for existing methods. AvatarForge overcomes these limitations by combining LLM-based commonsense reasoning with off-the-shelf 3D human generators, enabling fine-grained control over body and facial details. Unlike diffusion models which often rely on pre-trained datasets lacking precise control over individual human features, AvatarForge offers a more flexible approach, bringing humans into the iterative design and modeling loop, with its auto-verification system allowing for continuous refinement of the generated avatars, and thus promoting high accuracy and customization. Our evaluations show that AvatarForge outperforms state-of-the-art methods in both text- and image-to-avatar generation, making it a versatile tool for artistic creation and animation.