HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction

📅 2024-06-10

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

To address low accuracy in 3D reconstruction and pose tracking, as well as high annotation costs in hand-object interaction videos, this paper proposes a lightweight multi-view RGB-D and HoloLens collaborative capture system enabling highly synchronized hand-object interaction data acquisition. Methodologically, it integrates multi-camera calibration, HoloLens spatial awareness, and a geometry-constrained semi-automatic 3D annotation framework—significantly reducing annotation effort for joint hand-object pose and shape estimation. It is the first to systematically cover embodied interaction tasks including grasping, handover, and functional object use. Contributions include: (1) introducing HO-Cap, the first open-source dataset tailored for embodied AI and robotic manipulation, featuring diverse, long-duration interaction sequences; and (2) enabling high-fidelity joint 3D hand-object reconstruction and real-time pose tracking with millimeter- and degree-level accuracy.

Technology Category

Application Category

📝 Abstract

We introduce a data capture system and a new dataset, HO-Cap, for 3D reconstruction and pose tracking of hands and objects in videos. The system leverages multiple RGB-D cameras and a HoloLens headset for data collection, avoiding the use of expensive 3D scanners or mocap systems. We propose a semi-automatic method for annotating the shape and pose of hands and objects in the collected videos, significantly reducing the annotation time compared to manual labeling. With this system, we captured a video dataset of humans interacting with objects to perform various tasks, including simple pick-and-place actions, handovers between hands, and using objects according to their affordance, which can serve as human demonstrations for research in embodied AI and robot manipulation. Our data capture setup and annotation framework will be available for the community to use in reconstructing 3D shapes of objects and human hands and tracking their poses in videos.

Problem

Research questions and friction points this paper is trying to address.

Develops a system for 3D hand-object interaction tracking.

Introduces a dataset for 3D reconstruction and pose analysis.

Proposes a semi-automatic annotation method to reduce labeling time.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiple RGBD cameras and HoloLens for data capture

Semi-automatic annotation for hand and object pose

Dataset for 3D reconstruction and pose tracking

🔎 Similar Papers

WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild

2024-09-18arXiv.orgCitations: 8

ByteDance

San Jose

Research Scientist Intern, Machine Perception for Input and Interaction (PhD)