AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Existing single-image 3D human reconstruction methods suffer from insufficient geometric detail, multi-view inconsistency, and lack of direct articulation control. To address these limitations, we propose the first pose-conditioned 3D joint diffusion model that jointly models multi-view images and 3D Gaussian splatting (3DGS) representations, enabling A-pose initialization and arbitrary-pose retargeting. Furthermore, we introduce a cropping-aware camera-ray map-guided local refinement mechanism to enhance surface detail and geometric fidelity. Our method achieves state-of-the-art performance on both public benchmarks and in-the-wild images, significantly outperforming prior approaches in reconstruction accuracy, pose retargeting quality, and animation readiness. Crucially, the generated 3D model is directly bindable and animatable without post-processing—eliminating the need for manual skinning or optimization.

Technology Category

Application Category

📝 Abstract

Existing methods for image-to-3D avatar generation struggle to produce highly detailed, animation-ready avatars suitable for real-world applications. We introduce AdaHuman, a novel framework that generates high-fidelity animatable 3D avatars from a single in-the-wild image. AdaHuman incorporates two key innovations: (1) A pose-conditioned 3D joint diffusion model that synthesizes consistent multi-view images in arbitrary poses alongside corresponding 3D Gaussian Splats (3DGS) reconstruction at each diffusion step; (2) A compositional 3DGS refinement module that enhances the details of local body parts through image-to-image refinement and seamlessly integrates them using a novel crop-aware camera ray map, producing a cohesive detailed 3D avatar. These components allow AdaHuman to generate highly realistic standardized A-pose avatars with minimal self-occlusion, enabling rigging and animation with any input motion. Extensive evaluation on public benchmarks and in-the-wild images demonstrates that AdaHuman significantly outperforms state-of-the-art methods in both avatar reconstruction and reposing. Code and models will be publicly available for research purposes.

Problem

Research questions and friction points this paper is trying to address.

Generates detailed animatable 3D avatars from single images

Overcomes limitations in current image-to-3D avatar methods

Enhances realism and detail for rigging and animation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pose-conditioned 3D joint diffusion model

Compositional 3DGS refinement module

Generates animation-ready 3D avatars

🔎 Similar Papers

No similar papers found.