Map-Mono-Ego: Map-Grounded Global Human Pose Estimation from Monocular Egocentric Video

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the challenges of scale ambiguity and long-term trajectory drift in monocular egocentric video-based human pose estimation, which hinder accurate recovery of the user’s absolute position in the environment. To overcome these limitations, we propose MapMonoEgo, the first framework to integrate pre-scanned 3D point cloud maps with monocular egocentric video for globally consistent pose and trajectory estimation. We also introduce AIST-Living, a novel dataset featuring real-world motion trajectories paired with egocentric videos—the first of its kind. Experimental results demonstrate that, without requiring additional sensor hardware, our method significantly outperforms existing approaches on AIST-Living, achieving high-precision, temporally stable global human pose tracking over extended durations.

📝 Abstract

Monocular egocentric human pose estimation is essential for ubiquitous activity monitoring. However, understanding the user's absolute location within the environment remains a challenge. Existing methods primarily focus on relative motion from an initial position, and tend not to account for the wearer's absolute location within an environment. Furthermore, inherent scale ambiguity in monocular vision leads to severe translational drift, limiting long-term tracking without specialized multi-sensor hardware. To address this, we propose MapMonoEgo, a novel framework achieving globally consistent human pose estimation solely from a monocular camera by leveraging a pre-scanned 3D point cloud. We also introduce AIST-Living dataset, a new dataset pairing egocentric video with ground-truth motion in a scanned environment. Experiments demonstrate that our approach significantly outperforms the state-of-the-art baseline, proving its utility for practical monitoring tasks without specialized hardware.

Problem

Research questions and friction points this paper is trying to address.

monocular egocentric vision

absolute pose estimation

scale ambiguity

translational drift

global localization

Innovation

Methods, ideas, or system contributions that make the work stand out.

monocular egocentric vision

global human pose estimation

3D point cloud map