Progressive Inertial Poser: Progressive Real-Time Kinematic Chain Estimation for 3D Full-Body Pose from Three IMU Sensors

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work addresses the need for hardware-lightweight and environment-robust full-body pose estimation in VR. We propose a novel real-time method that reconstructs 3D full-body pose using only three IMUs—mounted on the head and both wrists. Unlike mainstream approaches relying on pelvic/lower-limb sensors or external vision, our method employs a progressive multi-stage network: a Transformer-enhanced bidirectional LSTM (TE-biLSTM) encoder, coupled with SMPL parameter regression, biomechanical priors, and hierarchical kinematic chain optimization to enforce whole-body motion constraints without lower-body sensing. To our knowledge, this is the first method achieving near-6-IMU accuracy under a 3-IMU configuration. It outperforms prior state-of-the-art methods with identical input modalities across multiple public benchmarks, reducing mean joint error by 12.7% and achieving end-to-end latency under 15 ms—significantly enhancing wearability and practical deployment in VR applications.

Technology Category

Application Category

📝 Abstract

The motion capture system that supports full-body virtual representation is of key significance for virtual reality. Compared to vision-based systems, full-body pose estimation from sparse tracking signals is not limited by environmental conditions or recording range. However, previous works either face the challenge of wearing additional sensors on the pelvis and lower-body or rely on external visual sensors to obtain global positions of key joints. To improve the practicality of the technology for virtual reality applications, we estimate full-body poses using only inertial data obtained from three Inertial Measurement Unit (IMU) sensors worn on the head and wrists, thereby reducing the complexity of the hardware system. In this work, we propose a method called Progressive Inertial Poser (ProgIP) for human pose estimation, which combines neural network estimation with a human dynamics model, considers the hierarchical structure of the kinematic chain, and employs a multi-stage progressive network estimation with increased depth to reconstruct full-body motion in real time. The encoder combines Transformer Encoder and bidirectional LSTM (TE-biLSTM) to flexibly capture the temporal dependencies of the inertial sequence, while the decoder based on multi-layer perceptrons (MLPs) transforms high-dimensional features and accurately projects them onto Skinned Multi-Person Linear (SMPL) model parameters. Quantitative and qualitative experimental results on multiple public datasets show that our method outperforms state-of-the-art methods with the same inputs, and is comparable to recent works using six IMU sensors.

Problem

Research questions and friction points this paper is trying to address.

Estimates full-body poses using only three IMU sensors

Combines neural networks with human dynamics model

Reconstructs real-time motion without lower-body sensors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses only three IMU sensors on head and wrists

Combines neural network with human dynamics model

Employs Transformer Encoder and bidirectional LSTM

🔎 Similar Papers

No similar papers found.