Mem-MLP: Real-Time 3D Human Motion Generation from Sparse Inputs

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

career value

251K/year

🤖 AI Summary

To address incomplete full-body motion reconstruction in AR/VR caused by reliance solely on head-mounted displays (HMDs) and handheld controllers, this paper proposes a lightweight, real-time full-body pose estimation method. The approach introduces three key contributions: (1) a learnable Memory-Block module that implicitly models temporal dynamics of occluded or unobserved joints via a discrete codebook; (2) an MLP-based backbone with residual connections and a multi-task learning framework jointly optimizing pose prediction, velocity consistency, and joint-angle constraints; and (3) end-to-end training achieving 72 FPS on mobile HMDs. Experiments demonstrate that the method achieves the best trade-off between accuracy—reducing MPJPE by 18.3%—and computational efficiency, significantly outperforming existing sparse-input-driven approaches for full-body pose estimation.

Technology Category

Application Category

📝 Abstract

Realistic and smooth full-body tracking is crucial for immersive AR/VR applications. Existing systems primarily track head and hands via Head Mounted Devices (HMDs) and controllers, making the 3D full-body reconstruction in-complete. One potential approach is to generate the full-body motions from sparse inputs collected from limited sensors using a Neural Network (NN) model. In this paper, we propose a novel method based on a multi-layer perceptron (MLP) backbone that is enhanced with residual connections and a novel NN-component called Memory-Block. In particular, Memory-Block represents missing sensor data with trainable code-vectors, which are combined with the sparse signals from previous time instances to improve the temporal consistency. Furthermore, we formulate our solution as a multi-task learning problem, allowing our MLP-backbone to learn robust representations that boost accuracy. Our experiments show that our method outperforms state-of-the-art baselines by substantially reducing prediction errors. Moreover, it achieves 72 FPS on mobile HMDs that ultimately improves the accuracy-running time tradeoff.

Problem

Research questions and friction points this paper is trying to address.

Generating full-body motion from sparse sensor inputs

Addressing incomplete 3D reconstruction in AR/VR systems

Improving temporal consistency of generated human motions

Innovation

Methods, ideas, or system contributions that make the work stand out.

MLP backbone enhanced with residual connections

Memory-Block represents missing data with code-vectors

Multi-task learning formulation for robust representations

🔎 Similar Papers

No similar papers found.