Free3D: 3D Human Motion Emerges from Single-View 2D Supervision

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of generating generalized, semantically aligned, and temporally coherent 3D human motion from single-view 2D keypoint sequences—without any 3D motion annotations. We propose Motion-Lifting Residual Quantized VAE, a novel framework that integrates residual vector quantization, multi-view consistency constraints, orientation coherence modeling, and physics-based losses to implicitly learn 3D structure and motion semantics under pure 2D supervision. Crucially, our method requires neither 3D ground truth nor multi-view data, enabling 3D motion to emerge end-to-end solely from 2D sequences. Evaluated on AMASS and 3DPW benchmarks, our generated motions match or surpass fully 3D-supervised approaches in diversity, temporal coherence, and semantic plausibility—demonstrating a significant reduction in reliance on precise 3D annotations.

Technology Category

Application Category

📝 Abstract
Recent 3D human motion generation models demonstrate remarkable reconstruction accuracy yet struggle to generalize beyond training distributions. This limitation arises partly from the use of precise 3D supervision, which encourages models to fit fixed coordinate patterns instead of learning the essential 3D structure and motion semantic cues required for robust generalization.To overcome this limitation, we propose Free3D, a framework that synthesizes realistic 3D motions without any 3D motion annotations. Free3D introduces a Motion-Lifting Residual Quantized VAE (ML-RQ) that maps 2D motion sequences into 3D-consistent latent spaces, and a suite of 3D-free regularization objectives enforcing view consistency, orientation coherence, and physical plausibility. Trained entirely on 2D motion data, Free3D generates diverse, temporally coherent, and semantically aligned 3D motions, achieving performance comparable to or even surpassing fully 3D-supervised counterparts. These results suggest that relaxing explicit 3D supervision encourages stronger structural reasoning and generalization, offering a scalable and data-efficient paradigm for 3D motion generation.
Problem

Research questions and friction points this paper is trying to address.

Generating 3D human motion without 3D supervision
Overcoming generalization limits in motion reconstruction models
Learning 3D structure from 2D motion sequences only
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 2D motion sequences without 3D annotations
Employs Motion-Lifting Residual Quantized VAE for 3D
Applies 3D-free regularization for view and physics consistency
🔎 Similar Papers
No similar papers found.