Motion Capture from Inertial and Vision Sensors

📅 2024-07-23
🏛️ arXiv.org
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
Accurate, low-cost human motion capture remains challenging in consumer-grade settings. Method: This paper proposes a lightweight multimodal framework integrating a single RGB camera with a minimal number of IMUs (≤6). We introduce MINIONS—the first large-scale, synchronized IMU-visual motion dataset—comprising 5 million frames of fine-grained single-person and interactive actions across 146 classes, annotated with ground-truth SMPL parameters and joint rotations. Our framework features cross-modal alignment, joint optimization, and IMU-augmented monocular pose estimation. Contribution/Results: Experiments demonstrate that our approach significantly outperforms pure vision- or pure inertial-based baselines under the constraint of only one camera and few IMUs, achieving high-fidelity full-body motion reconstruction. The method establishes a new deployable paradigm for motion capture in everyday environments.

Technology Category

Application Category

📝 Abstract
Human motion capture is the foundation for many computer vision and graphics tasks. While industrial motion capture systems with complex camera arrays or expensive wearable sensors have been widely adopted in movie and game production, consumer-affordable and easy-to-use solutions for personal applications are still far from mature. To utilize a mixture of a monocular camera and very few inertial measurement units (IMUs) for accurate multi-modal human motion capture in daily life, we contribute MINIONS in this paper, a large-scale Motion capture dataset collected from INertial and visION Sensors. MINIONS has several featured properties: 1) large scale of over five million frames and 400 minutes duration; 2) multi-modality data of IMUs signals and RGB videos labeled with joint positions, joint rotations, SMPL parameters, etc.; 3) a diverse set of 146 fine-grained single and interactive actions with textual descriptions. With the proposed MINIONS, we conduct experiments on multi-modal motion capture and explore the possibilities of consumer-affordable motion capture using a monocular camera and very few IMUs. The experiment results emphasize the unique advantages of inertial and vision sensors, showcasing the promise of consumer-affordable multi-modal motion capture and providing a valuable resource for further research and development.
Problem

Research questions and friction points this paper is trying to address.

Developing affordable motion capture using monocular cameras and few IMUs
Creating large multimodal dataset with inertial and vision sensor data
Exploring supplementary features between IMUs and videos for motion tracking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining monocular camera with few IMUs
Creating large multi-modal motion capture dataset
Developing SparseNet for sensor fusion framework
🔎 Similar Papers
No similar papers found.