NoPo-Avatar: Generalizable and Animatable Avatars from Sparse Inputs without Human Poses

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of reconstructing animatable 3D human avatars from a single or sparse set of images, without pose priors. We propose an end-to-end neural implicit framework that requires no pose input at test time. Our method eliminates reliance on accurate pose estimation or camera parameters, jointly modeling implicit geometry and view-dependent appearance, with images as the sole supervision signal for both training and inference. The key contribution is the decoupling of pose learning from deformation modeling, thereby preventing pose estimation noise from propagating into geometry reconstruction and significantly improving robustness in real-world scenarios. Extensive experiments on THuman2.0, XHuman, and HuGe100K demonstrate that our approach substantially outperforms existing methods under pose-free conditions, while achieving comparable performance when ground-truth poses are available—validating its generality and effectiveness.

Technology Category

Application Category

📝 Abstract
We tackle the task of recovering an animatable 3D human avatar from a single or a sparse set of images. For this task, beyond a set of images, many prior state-of-the-art methods use accurate "ground-truth" camera poses and human poses as input to guide reconstruction at test-time. We show that pose-dependent reconstruction degrades results significantly if pose estimates are noisy. To overcome this, we introduce NoPo-Avatar, which reconstructs avatars solely from images, without any pose input. By removing the dependence of test-time reconstruction on human poses, NoPo-Avatar is not affected by noisy human pose estimates, making it more widely applicable. Experiments on challenging THuman2.0, XHuman, and HuGe100K data show that NoPo-Avatar outperforms existing baselines in practical settings (without ground-truth poses) and delivers comparable results in lab settings (with ground-truth poses).
Problem

Research questions and friction points this paper is trying to address.

Recovering animatable 3D human avatars from sparse image inputs
Eliminating dependence on accurate ground-truth human pose data
Overcoming performance degradation caused by noisy pose estimates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconstructs avatars solely from images
Eliminates dependence on human pose input
Outperforms baselines without ground-truth poses
🔎 Similar Papers
No similar papers found.
J
Jing Wen
University of Illinois Urbana-Champaign
A
Alexander G. Schwing
University of Illinois Urbana-Champaign
Shenlong Wang
Shenlong Wang
University of Illinois at Urbana-Champaign
Computer VisionRobot PerceptionAutonomous Driving