Zero-Shot Reconstruction of Animatable 3D Avatars with Cloth Dynamics from a Single Image

📅 2026-03-15

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This work addresses the limitations of existing single-image 3D human reconstruction methods, which rely on rigid joint transformations and struggle to capture realistic clothing dynamics. To overcome this, we propose DynaAvatar, a novel framework that, for the first time, enables zero-shot generation of animatable 3D human avatars from a single image while accurately recovering motion-induced clothing deformations. Built upon a Transformer-based feedforward architecture, DynaAvatar directly predicts dynamic 3D Gaussian deformations without requiring test-time optimization. The method leverages static-to-dynamic knowledge transfer, lightweight LoRA fine-tuning, a DynaFlow optical flow-guided loss, and an SMPL-X re-annotation strategy to significantly enhance the fidelity of dynamic clothing modeling. Experiments demonstrate that DynaAvatar substantially outperforms current approaches in both visual realism and generalization capability.

Technology Category

Application Category

📝 Abstract

Existing single-image 3D human avatar methods primarily rely on rigid joint transformations, limiting their ability to model realistic cloth dynamics. We present DynaAvatar, a zero-shot framework that reconstructs animatable 3D human avatars with motion-dependent cloth dynamics from a single image. Trained on large-scale multi-person motion datasets, DynaAvatar employs a Transformer-based feed-forward architecture that directly predicts dynamic 3D Gaussian deformations without subject-specific optimization. To overcome the scarcity of dynamic captures, we introduce a static-to-dynamic knowledge transfer strategy: a Transformer pretrained on large-scale static captures provides strong geometric and appearance priors, which are efficiently adapted to motion-dependent deformations through lightweight LoRA fine-tuning on dynamic captures. We further propose the DynaFlow loss, an optical flow-guided objective that provides reliable motion-direction geometric cues for cloth dynamics in rendered space. Finally, we reannotate the missing or noisy SMPL-X fittings in existing dynamic capture datasets, as most public dynamic capture datasets contain incomplete or unreliable fittings that are unsuitable for training high-quality 3D avatar reconstruction models. Experiments demonstrate that DynaAvatar produces visually rich and generalizable animations, outperforming prior methods.

Problem

Research questions and friction points this paper is trying to address.

cloth dynamics

3D avatar

single-image reconstruction

animatable avatar

zero-shot

Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot avatar reconstruction

cloth dynamics

Transformer-based deformation