SOAP: Style-Omniscient Animatable Portraits

📅 2025-05-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generating animatable, style-agnostic (photorealistic/cartoon/anime), and topology-consistent 3D avatars from a single portrait image remains challenging due to limited style generalization, loss of accessory and hairstyle details, lack of articulation control, and severe artifacts. This paper proposes the first single-image-driven, style-invariant 3D avatar generation framework, integrating multi-view diffusion priors with FLAME deformation-adaptive optimization via differentiable rendering to jointly optimize geometry, texture, and rigging parameters. Our method supports FACS-based facial animation, eyeball and teeth modeling, and high-fidelity reconstruction of complex hairstyles and accessories. Evaluated on our newly introduced 24K multi-style 3D avatar dataset, it significantly outperforms state-of-the-art methods in both single-view reconstruction and image-to-3D generation, yielding avatars with superior texture fidelity, physically plausible geometry, and strong animation controllability. Code and data are publicly available.

Technology Category

Application Category

📝 Abstract
Creating animatable 3D avatars from a single image remains challenging due to style limitations (realistic, cartoon, anime) and difficulties in handling accessories or hairstyles. While 3D diffusion models advance single-view reconstruction for general objects, outputs often lack animation controls or suffer from artifacts because of the domain gap. We propose SOAP, a style-omniscient framework to generate rigged, topology-consistent avatars from any portrait. Our method leverages a multiview diffusion model trained on 24K 3D heads with multiple styles and an adaptive optimization pipeline to deform the FLAME mesh while maintaining topology and rigging via differentiable rendering. The resulting textured avatars support FACS-based animation, integrate with eyeballs and teeth, and preserve details like braided hair or accessories. Extensive experiments demonstrate the superiority of our method over state-of-the-art techniques for both single-view head modeling and diffusion-based generation of Image-to-3D. Our code and data are publicly available for research purposes at https://github.com/TingtingLiao/soap.
Problem

Research questions and friction points this paper is trying to address.

Creating animatable 3D avatars from single images
Overcoming style limitations and accessory challenges
Generating rigged, topology-consistent avatars for animation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiview diffusion model for diverse styles
Adaptive optimization maintains mesh topology
Differentiable rendering enables detailed animations
🔎 Similar Papers
No similar papers found.
Tingting Liao
Tingting Liao
PhD of MBZUAI
3D Human Generation
Yujian Zheng
Yujian Zheng
Mohamed bin Zayed University of Artificial Intelligence
Computer GraphicsComputer Vision
A
Adilbek Karmanov
Mohamed bin Zayed University of Artificial Intelligence, UAE
L
Liwen Hu
Pinscreen, USA
L
Leyang Jin
Mohamed bin Zayed University of Artificial Intelligence, UAE
Yuliang Xiu
Yuliang Xiu
Westlake University | Max Planck Institute for Intelligent Systems
Computer GraphicsComputer VisionDigital Humans
H
Hao Li
Mohamed bin Zayed University of Artificial Intelligence, UAE and Pinscreen, USA