🤖 AI Summary
Existing VAEs and diffusion models for 3D human motion recovery suffer from poor robustness, temporal inconsistency, and physically implausible predictions. To address these issues, we propose the Neural Riemannian Motion Field (NRMF), the first method to construct a neural distance field (NDF) on the joint Riemannian manifold of joint rotations, angular velocities, and angular accelerations—thereby explicitly encoding geometric constraints and dynamical priors of human motion. NRMF employs an adaptive-step projection algorithm and a geometric integrator to generate physically consistent, continuous motion trajectories on the zero-level set. Trained on the AMASS dataset, NRMF achieves significant improvements across multimodal tasks—including denoising, motion interpolation, and fitting to partial 2D/3D observations—demonstrating superior generalization, spatiotemporal coherence, and physical plausibility compared to prior approaches.
📝 Abstract
We introduce Neural Riemannian Motion Fields (NRMF), a novel 3D generative human motion prior that enables robust, temporally consistent, and physically plausible 3D motion recovery. Unlike existing VAE or diffusion-based methods, our higher-order motion prior explicitly models the human motion in the zero level set of a collection of neural distance fields (NDFs) corresponding to pose, transition (velocity), and acceleration dynamics. Our framework is rigorous in the sense that our NDFs are constructed on the product space of joint rotations, their angular velocities, and angular accelerations, respecting the geometry of the underlying articulations. We further introduce: (i) a novel adaptive-step hybrid algorithm for projecting onto the set of plausible motions, and (ii) a novel geometric integrator to "roll out" realistic motion trajectories during test-time-optimization and generation. Our experiments show significant and consistent gains: trained on the AMASS dataset, NRMF remarkably generalizes across multiple input modalities and to diverse tasks ranging from denoising to motion in-betweening and fitting to partial 2D / 3D observations.