Mem-MLP: Real-Time 3D Human Motion Generation from Sparse Inputs

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address incomplete full-body motion reconstruction in AR/VR caused by reliance solely on head-mounted displays (HMDs) and handheld controllers, this paper proposes a lightweight, real-time full-body pose estimation method. The approach introduces three key contributions: (1) a learnable Memory-Block module that implicitly models temporal dynamics of occluded or unobserved joints via a discrete codebook; (2) an MLP-based backbone with residual connections and a multi-task learning framework jointly optimizing pose prediction, velocity consistency, and joint-angle constraints; and (3) end-to-end training achieving 72 FPS on mobile HMDs. Experiments demonstrate that the method achieves the best trade-off between accuracy—reducing MPJPE by 18.3%—and computational efficiency, significantly outperforming existing sparse-input-driven approaches for full-body pose estimation.

Technology Category

Application Category

📝 Abstract
Realistic and smooth full-body tracking is crucial for immersive AR/VR applications. Existing systems primarily track head and hands via Head Mounted Devices (HMDs) and controllers, making the 3D full-body reconstruction in-complete. One potential approach is to generate the full-body motions from sparse inputs collected from limited sensors using a Neural Network (NN) model. In this paper, we propose a novel method based on a multi-layer perceptron (MLP) backbone that is enhanced with residual connections and a novel NN-component called Memory-Block. In particular, Memory-Block represents missing sensor data with trainable code-vectors, which are combined with the sparse signals from previous time instances to improve the temporal consistency. Furthermore, we formulate our solution as a multi-task learning problem, allowing our MLP-backbone to learn robust representations that boost accuracy. Our experiments show that our method outperforms state-of-the-art baselines by substantially reducing prediction errors. Moreover, it achieves 72 FPS on mobile HMDs that ultimately improves the accuracy-running time tradeoff.
Problem

Research questions and friction points this paper is trying to address.

Generating full-body motion from sparse sensor inputs
Addressing incomplete 3D reconstruction in AR/VR systems
Improving temporal consistency of generated human motions
Innovation

Methods, ideas, or system contributions that make the work stand out.

MLP backbone enhanced with residual connections
Memory-Block represents missing data with code-vectors
Multi-task learning formulation for robust representations
🔎 Similar Papers
No similar papers found.
S
Sinan Mutlu
Samsung R&D Institute UK (SRUK)
G
Georgios F. Angelis
Information Technologies Institute, CERTH
S
Savas Ozkan
Samsung R&D Institute UK (SRUK)
P
Paul Wisbey
Samsung R&D Institute UK (SRUK)
Anastasios Drosou
Anastasios Drosou
CERTH-ITI
M
Mete Ozay
Samsung R&D Institute UK (SRUK)