PERSONA: Personalized Whole-Body 3D Avatar with Pose-Driven Deformations from a Single Image

📅 2025-08-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging problem of reconstructing animatable, identity-consistent, full-body 3D human avatars from a single input image. To tackle identity drift in diffusion-based video generation and ensure geometric fidelity, we propose a novel method integrating diffusion priors with geometry-aware 3D optimization. Specifically, we introduce a balanced sampling strategy to mitigate identity inconsistency across poses and a geometry-weighted optimization scheme that prioritizes surface deformation constraints. Our framework first leverages a diffusion model to synthesize a pose-diverse synthetic training video from the single image; this video then drives end-to-end 3D reconstruction using either Neural Radiance Fields (NeRF) or 3D Gaussian Splatting (3DGS). Experiments demonstrate that our approach achieves high-fidelity, cloth-aware dynamic geometry, photorealistic appearance, and strong cross-pose identity consistency—significantly outperforming existing single-image 3D human reconstruction methods.

Technology Category

Application Category

📝 Abstract
Two major approaches exist for creating animatable human avatars. The first, a 3D-based approach, optimizes a NeRF- or 3DGS-based avatar from videos of a single person, achieving personalization through a disentangled identity representation. However, modeling pose-driven deformations, such as non-rigid cloth deformations, requires numerous pose-rich videos, which are costly and impractical to capture in daily life. The second, a diffusion-based approach, learns pose-driven deformations from large-scale in-the-wild videos but struggles with identity preservation and pose-dependent identity entanglement. We present PERSONA, a framework that combines the strengths of both approaches to obtain a personalized 3D human avatar with pose-driven deformations from a single image. PERSONA leverages a diffusion-based approach to generate pose-rich videos from the input image and optimizes a 3D avatar based on them. To ensure high authenticity and sharp renderings across diverse poses, we introduce balanced sampling and geometry-weighted optimization. Balanced sampling oversamples the input image to mitigate identity shifts in diffusion-generated training videos. Geometry-weighted optimization prioritizes geometry constraints over image loss, preserving rendering quality in diverse poses.
Problem

Research questions and friction points this paper is trying to address.

Creating personalized 3D avatars from single images
Modeling pose-driven deformations without extensive video data
Preserving identity in diffusion-based avatar generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines 3D and diffusion-based approaches for avatars
Generates pose-rich videos from a single image
Uses balanced sampling and geometry-weighted optimization
🔎 Similar Papers
No similar papers found.