🤖 AI Summary
Existing methods for video-based human mesh reconstruction often yield physically implausible results due to their reliance on inaccurate intermediate 3D pose representations and difficulties in modeling complex spatiotemporal dynamics. To address these limitations, this work proposes HMRMamba, a novel framework that introduces structured state space models (SSMs) to this task for the first time. The approach features a dual-scan Mamba architecture for geometry-aware 2D-to-3D pose lifting and a motion-guided temporal reconstruction network that explicitly models human kinematic patterns. Evaluated on 3DPW, MPI-INF-3DHP, and Human3.6M, HMRMamba achieves new state-of-the-art performance, significantly outperforming existing methods in reconstruction accuracy, temporal consistency, and computational efficiency.
📝 Abstract
Existing video-based 3D Human Mesh Recovery (HMR) methods often produce physically implausible results, stemming from their reliance on flawed intermediate 3D pose anchors and their inability to effectively model complex spatiotemporal dynamics. To overcome these deep-rooted architectural problems, we introduce HMRMamba, a new paradigm for HMR that pioneers the use of Structured State Space Models (SSMs) for their efficiency and long-range modeling prowess. Our framework is distinguished by two core contributions. First, the Geometry-Aware Lifting Module, featuring a novel dual-scan Mamba architecture, creates a robust foundation for reconstruction. It directly grounds the 2D-to-3D pose lifting process with geometric cues from image features, producing a highly reliable 3D pose sequence that serves as a stable anchor. Second, the Motion-guided Reconstruction Network leverages this anchor to explicitly process kinematic patterns over time. By injecting this crucial temporal awareness, it significantly enhances the final mesh's coherence and robustness, particularly under occlusion and motion blur. Comprehensive evaluations on 3DPW, MPI-INF-3DHP, and Human3.6M benchmarks confirm that HMRMamba sets a new state-of-the-art, outperforming existing methods in both reconstruction accuracy and temporal consistency while offering superior computational efficiency.