E-3DPSM: A State Machine for Event-Based Egocentric 3D Human Pose Estimation

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing event camera–based egocentric 3D human pose estimation methods suffer from limited accuracy and high sensitivity to self-occlusion and temporal jitter. This work proposes E-3DPSM, the first continuous pose state machine tailored for event streams, which aligns fine-grained event dynamics with human motion by explicitly modeling the asynchronous and continuous nature of event data. By integrating event-driven joint position evolution with direct 3D pose prediction, E-3DPSM enables stable, drift-free pose reconstruction. The end-to-end trainable architecture achieves state-of-the-art performance on two benchmarks, improving MPJPE by up to 19% and enhancing temporal stability by 2.7×, while supporting real-time inference at 80 Hz.

Technology Category

Application Category

📝 Abstract

Event cameras offer multiple advantages in monocular egocentric 3D human pose estimation from head-mounted devices, such as millisecond temporal resolution, high dynamic range, and negligible motion blur. Existing methods effectively leverage these properties, but suffer from low 3D estimation accuracy, insufficient in many applications (e.g., immersive VR/AR). This is due to the design not being fully tailored towards event streams (e.g., their asynchronous and continuous nature), leading to high sensitivity to self-occlusions and temporal jitter in the estimates. This paper rethinks the setting and introduces E-3DPSM, an event-driven continuous pose state machine for event-based egocentric 3D human pose estimation. E-3DPSM aligns continuous human motion with fine-grained event dynamics; it evolves latent states and predicts continuous changes in 3D joint positions associated with observed events, which are fused with direct 3D human pose predictions, leading to stable and drift-free final 3D pose reconstructions. E-3DPSM runs in real-time at 80 Hz on a single workstation and sets a new state of the art in experiments on two benchmarks, improving accuracy by up to 19% (MPJPE) and temporal stability by up to 2.7x. See our project page for the source code and trained models.

Problem

Research questions and friction points this paper is trying to address.

event-based

egocentric

3D human pose estimation

temporal jitter

self-occlusions

Innovation

Methods, ideas, or system contributions that make the work stand out.

event-based vision

egocentric 3D pose estimation

continuous state machine

temporal stability

asynchronous event stream

🔎 Similar Papers

No similar papers found.

ByteDance

San Jose

Research Scientist Intern, Machine Perception for Input and Interaction (PhD)