🤖 AI Summary
Existing HMR datasets suffer from occlusion, illumination variability, and privacy constraints, while mmWave radar datasets are limited by sparse annotations, small scale, and narrow action diversity. To address these limitations, we introduce the first large-scale multimodal mmWave radar-based human mesh recovery benchmark—comprising 661K frames—with synchronized acquisition of high-resolution mmWave tensors/point clouds, RGB-D imagery, and motion-capture-grade 3D mesh and trajectory annotations. Our benchmark uniquely enables fine-grained RF signal modeling across complex scenarios including free-form movement and rehabilitation tasks, and supports both unimodal radar and RGB-D fusion evaluation. We establish new baselines on real-time (RT) and radar pose classification (RPC), as well as cross-modal transfer tasks. Empirical results demonstrate mmWave radar’s robustness under occlusion, low-light conditions, and privacy-sensitive settings, while also revealing modeling bottlenecks in fast, unconstrained motion.
📝 Abstract
Human mesh reconstruction (HMR) provides direct insights into body-environment interaction, which enables various immersive applications. While existing large-scale HMR datasets rely heavily on line-of-sight RGB input, vision-based sensing is limited by occlusion, lighting variation, and privacy concerns. To overcome these limitations, recent efforts have explored radio-frequency (RF) mmWave radar for privacy-preserving indoor human sensing. However, current radar datasets are constrained by sparse skeleton labels, limited scale, and simple in-place actions. To advance the HMR research community, we introduce M4Human, the current largest-scale (661K-frame) ($9 imes$ prior largest) multimodal benchmark, featuring high-resolution mmWave radar, RGB, and depth data. M4Human provides both raw radar tensors (RT) and processed radar point clouds (RPC) to enable research across different levels of RF signal granularity. M4Human includes high-quality motion capture (MoCap) annotations with 3D meshes and global trajectories, and spans 20 subjects and 50 diverse actions, including in-place, sit-in-place, and free-space sports or rehabilitation movements. We establish benchmarks on both RT and RPC modalities, as well as multimodal fusion with RGB-D modalities. Extensive results highlight the significance of M4Human for radar-based human modeling while revealing persistent challenges under fast, unconstrained motion. The dataset and code will be released after the paper publication.