Better Together: Unified Motion Capture and 3D Avatar Reconstruction

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of jointly estimating human poses and reconstructing photorealistic 3D avatars from multi-view video. We propose an end-to-end joint optimization framework. Methodologically, we introduce the first personalized mesh-bound animatable 3D Gaussian representation, coupled with a time-varying MLP to model skeletal motion—enabling kinematically consistent spatiotemporal pose estimation and joint geometry-appearance optimization. Compared to conventional sequential pipelines, our approach reduces full-body joint error by 35% and hand joint error by 45% on a yoga dataset, while improving novel-view synthesis PSNR by 2 dB, significantly enhancing motion capture accuracy and photorealistic real-time rendering. Our core contributions are: (1) the first drivable, personalized 3D Gaussian human representation; and (2) a novel paradigm for joint optimization of motion and geometry.

Technology Category

Application Category

📝 Abstract
We present Better Together, a method that simultaneously solves the human pose estimation problem while reconstructing a photorealistic 3D human avatar from multi-view videos. While prior art usually solves these problems separately, we argue that joint optimization of skeletal motion with a 3D renderable body model brings synergistic effects, i.e. yields more precise motion capture and improved visual quality of real-time rendering of avatars. To achieve this, we introduce a novel animatable avatar with 3D Gaussians rigged on a personalized mesh and propose to optimize the motion sequence with time-dependent MLPs that provide accurate and temporally consistent pose estimates. We first evaluate our method on highly challenging yoga poses and demonstrate state-of-the-art accuracy on multi-view human pose estimation, reducing error by 35% on body joints and 45% on hand joints compared to keypoint-based methods. At the same time, our method significantly boosts the visual quality of animatable avatars (+2dB PSNR on novel view synthesis) on diverse challenging subjects.
Problem

Research questions and friction points this paper is trying to address.

Simultaneous human pose estimation and 3D avatar reconstruction.
Joint optimization for precise motion capture and avatar rendering.
Improved accuracy and visual quality in multi-view video analysis.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint optimization of skeletal motion and 3D avatar model
Animatable avatar with 3D Gaussians on personalized mesh
Time-dependent MLPs for accurate, consistent pose estimation
🔎 Similar Papers
No similar papers found.