🤖 AI Summary
Generating animatable, style-agnostic (photorealistic/cartoon/anime), and topology-consistent 3D avatars from a single portrait image remains challenging due to limited style generalization, loss of accessory and hairstyle details, lack of articulation control, and severe artifacts. This paper proposes the first single-image-driven, style-invariant 3D avatar generation framework, integrating multi-view diffusion priors with FLAME deformation-adaptive optimization via differentiable rendering to jointly optimize geometry, texture, and rigging parameters. Our method supports FACS-based facial animation, eyeball and teeth modeling, and high-fidelity reconstruction of complex hairstyles and accessories. Evaluated on our newly introduced 24K multi-style 3D avatar dataset, it significantly outperforms state-of-the-art methods in both single-view reconstruction and image-to-3D generation, yielding avatars with superior texture fidelity, physically plausible geometry, and strong animation controllability. Code and data are publicly available.
📝 Abstract
Creating animatable 3D avatars from a single image remains challenging due to style limitations (realistic, cartoon, anime) and difficulties in handling accessories or hairstyles. While 3D diffusion models advance single-view reconstruction for general objects, outputs often lack animation controls or suffer from artifacts because of the domain gap. We propose SOAP, a style-omniscient framework to generate rigged, topology-consistent avatars from any portrait. Our method leverages a multiview diffusion model trained on 24K 3D heads with multiple styles and an adaptive optimization pipeline to deform the FLAME mesh while maintaining topology and rigging via differentiable rendering. The resulting textured avatars support FACS-based animation, integrate with eyeballs and teeth, and preserve details like braided hair or accessories. Extensive experiments demonstrate the superiority of our method over state-of-the-art techniques for both single-view head modeling and diffusion-based generation of Image-to-3D. Our code and data are publicly available for research purposes at https://github.com/TingtingLiao/soap.