OnlineHMR: Video-based Online World-Grounded Human Mesh Recovery

📅 2026-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of real-time 3D human mesh recovery in online settings such as AR/VR, where existing methods—relying on offline processing or global optimization—fail to meet stringent latency and causality requirements. We propose the first fully online framework for world-coordinate human mesh reconstruction that simultaneously ensures causality, fidelity, temporal consistency, and computational efficiency while achieving high-accuracy pose and trajectory estimation. Our approach introduces a novel dual-branch architecture integrated with causal key-value caching, sliding-window learning, an ego-centric incremental SLAM alignment module, and a physically plausible trajectory refinement mechanism. Experiments demonstrate that our method attains accuracy comparable to state-of-the-art offline approaches on the EMDB benchmark and on highly dynamic in-the-wild videos, while uniquely enabling truly online inference.

Technology Category

Application Category

📝 Abstract
Human mesh recovery (HMR) models 3D human body from monocular videos, with recent works extending it to world-coordinate human trajectory and motion reconstruction. However, most existing methods remain offline, relying on future frames or global optimization, which limits their applicability in interactive feedback and perception-action loop scenarios such as AR/VR and telepresence. To address this, we propose OnlineHMR, a fully online framework that jointly satisfies four essential criteria of online processing, including system-level causality, faithfulness, temporal consistency, and efficiency. Built upon a two-branch architecture, OnlineHMR enables streaming inference via a causal key-value cache design and a curated sliding-window learning strategy. Meanwhile, a human-centric incremental SLAM provides online world-grounded alignment under physically plausible trajectory correction. Experimental results show that our method achieves performance comparable to existing chunk-based approaches on the standard EMDB benchmark and highly dynamic custom videos, while uniquely supporting online processing. Page and code are available at https://tsukasane.github.io/Video-OnlineHMR/.
Problem

Research questions and friction points this paper is trying to address.

Human Mesh Recovery
Online Processing
World-Grounded Reconstruction
Temporal Consistency
Causal Inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

online human mesh recovery
causal inference
incremental SLAM
world-grounded reconstruction
streaming video processing