🤖 AI Summary
Existing single-image 3D human reconstruction methods suffer from insufficient geometric detail, multi-view inconsistency, and lack of direct articulation control. To address these limitations, we propose the first pose-conditioned 3D joint diffusion model that jointly models multi-view images and 3D Gaussian splatting (3DGS) representations, enabling A-pose initialization and arbitrary-pose retargeting. Furthermore, we introduce a cropping-aware camera-ray map-guided local refinement mechanism to enhance surface detail and geometric fidelity. Our method achieves state-of-the-art performance on both public benchmarks and in-the-wild images, significantly outperforming prior approaches in reconstruction accuracy, pose retargeting quality, and animation readiness. Crucially, the generated 3D model is directly bindable and animatable without post-processing—eliminating the need for manual skinning or optimization.
📝 Abstract
Existing methods for image-to-3D avatar generation struggle to produce highly detailed, animation-ready avatars suitable for real-world applications. We introduce AdaHuman, a novel framework that generates high-fidelity animatable 3D avatars from a single in-the-wild image. AdaHuman incorporates two key innovations: (1) A pose-conditioned 3D joint diffusion model that synthesizes consistent multi-view images in arbitrary poses alongside corresponding 3D Gaussian Splats (3DGS) reconstruction at each diffusion step; (2) A compositional 3DGS refinement module that enhances the details of local body parts through image-to-image refinement and seamlessly integrates them using a novel crop-aware camera ray map, producing a cohesive detailed 3D avatar. These components allow AdaHuman to generate highly realistic standardized A-pose avatars with minimal self-occlusion, enabling rigging and animation with any input motion. Extensive evaluation on public benchmarks and in-the-wild images demonstrates that AdaHuman significantly outperforms state-of-the-art methods in both avatar reconstruction and reposing. Code and models will be publicly available for research purposes.