A Dataset and Evaluation for Complex 4D Markerless Human Motion Capture

📅 2026-04-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

211K/year
🤖 AI Summary
Existing markerless 4D human motion capture methods suffer significant performance degradation in complex real-world scenarios—such as multi-person interactions, severe occlusions, and rapid positional exchanges—primarily due to the lack of high-quality training and evaluation data. To address this gap, this work introduces a new dataset and benchmark specifically designed for challenging markerless 4D human motion capture, systematically incorporating difficult multi-person interactions including frequent occlusions, rapid identity swaps among similarly dressed individuals, and dynamic inter-person distance variations. The dataset provides multi-view RGB-D videos, precise camera calibration, ground-truth 3D poses captured by a Vicon system, and corresponding SMPL/SMPL-X parameters. Benchmark evaluations reveal substantial performance drops in state-of-the-art methods under these conditions, while targeted fine-tuning demonstrates improved generalization, confirming the dataset’s challenge and practical utility.

Technology Category

Application Category

📝 Abstract
Marker-based motion capture (MoCap) systems have long been the gold standard for accurate 4D human modeling, yet their reliance on specialized hardware and markers limits scalability and real-world deployment. Advancing reliable markerless 4D human motion capture requires datasets that reflect the complexity of real-world human interactions. Yet, existing benchmarks often lack realistic multi-person dynamics, severe occlusions, and challenging interaction patterns, leading to a persistent domain gap. In this work, we present a new dataset and evaluation for complex 4D markerless human motion capture. Our proposed MoCap dataset captures both single and multi-person scenarios with intricate motions, frequent inter-person occlusions, rapid position exchanges between similarly dressed subjects, and varying subject distances. It includes synchronized multi-view RGB and depth sequences, accurate camera calibration, ground-truth 3D motion capture from a Vicon system, and corresponding SMPL/SMPL-X parameters. This setup ensures precise alignment between visual observations and motion ground truth. Benchmarking state-of-the-art markerless MoCap models reveals substantial performance degradation under these realistic conditions, highlighting limitations of current approaches. We further demonstrate that targeted fine-tuning improves generalization, validating the dataset's realism and value for model development. Our evaluation exposes critical gaps in existing models and provides a rigorous foundation for advancing robust markerless 4D human motion capture.
Problem

Research questions and friction points this paper is trying to address.

markerless motion capture
4D human modeling
multi-person interaction
occlusion
real-world dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

markerless motion capture
4D human modeling
multi-person interaction
occlusion handling
SMPL-X
🔎 Similar Papers
2024-07-04Image and Vision ComputingCitations: 3