Multimodal Generation of Animatable 3D Human Models with AvatarForge

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion-based methods for 3D human avatar generation face three key bottlenecks: (1) complex modeling of anatomical structure and pose, (2) scarcity of high-quality, semantically rich annotations, and (3) absence of production-ready skeletal rigging in outputs. To address these, we propose LLM-Augmented 3D Avatar Generation—a novel framework that integrates large language models’ commonsense reasoning into the 3D human generation pipeline. It synergistically combines multimodal alignment, procedural human modeling, and automated geometry-rig co-verification to establish a human-in-the-loop “generate → verify → refine”闭环. The method supports both text and image conditioning, enabling fine-grained controllability over body and facial geometry. Experiments demonstrate state-of-the-art performance on text/image-to-avatar generation, yielding avatars with high-fidelity geometry, semantically consistent details, and production-ready skeletal rigs—significantly accelerating digital content creation.

Technology Category

Application Category

📝 Abstract
We introduce AvatarForge, a framework for generating animatable 3D human avatars from text or image inputs using AI-driven procedural generation. While diffusion-based methods have made strides in general 3D object generation, they struggle with high-quality, customizable human avatars due to the complexity and diversity of human body shapes, poses, exacerbated by the scarcity of high-quality data. Additionally, animating these avatars remains a significant challenge for existing methods. AvatarForge overcomes these limitations by combining LLM-based commonsense reasoning with off-the-shelf 3D human generators, enabling fine-grained control over body and facial details. Unlike diffusion models which often rely on pre-trained datasets lacking precise control over individual human features, AvatarForge offers a more flexible approach, bringing humans into the iterative design and modeling loop, with its auto-verification system allowing for continuous refinement of the generated avatars, and thus promoting high accuracy and customization. Our evaluations show that AvatarForge outperforms state-of-the-art methods in both text- and image-to-avatar generation, making it a versatile tool for artistic creation and animation.
Problem

Research questions and friction points this paper is trying to address.

Generates animatable 3D human avatars from text or images.
Overcomes limitations in high-quality, customizable avatar creation.
Enables fine-grained control over body and facial details.
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-driven procedural generation for 3D avatars
LLM-based commonsense reasoning for fine control
Auto-verification system for continuous refinement
🔎 Similar Papers
No similar papers found.