GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities

📅 2024-12-05

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing hand-object interaction (HOI) datasets suffer from limited scale, insufficient diversity, and coarse-grained annotations, hindering progress in modeling bimanual activities. Method: We introduce the Hand-Object Interaction Dataset (HOID), the first ultra-large-scale HOI benchmark—comprising 34 hours of multi-view RGB video from 56 subjects manipulating 417 object categories. It provides 14,000 high-fidelity 3D hand–object motion clips and 84,000 structured textual descriptions. Our fully automatic, markerless acquisition pipeline integrates million-frame multi-view 3D pose estimation, self-supervised hand–object interaction modeling, lightweight text annotation, and dynamic NeRF-based reconstruction. Contribution/Results: HOID surpasses all prior benchmarks in scale, object/action diversity, and annotation fidelity. Empirical evaluation demonstrates substantial improvements in downstream tasks—including text-driven motion synthesis, automatic hand-motion captioning, and dynamic scene reconstruction—highlighting its strong generalization capability.

Technology Category

Application Category

📝 Abstract

Understanding bimanual human hand activities is a critical problem in AI and robotics. We cannot build large models of bimanual activities because existing datasets lack the scale, coverage of diverse hand activities, and detailed annotations. We introduce GigaHands, a massive annotated dataset capturing 34 hours of bimanual hand activities from 56 subjects and 417 objects, totaling 14k motion clips derived from 183 million frames paired with 84k text annotations. Our markerless capture setup and data acquisition protocol enable fully automatic 3D hand and object estimation while minimizing the effort required for text annotation. The scale and diversity of GigaHands enable broad applications, including text-driven action synthesis, hand motion captioning, and dynamic radiance field reconstruction. Our website are avaliable at https://ivl.cs.brown.edu/research/gigahands.html .

Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale datasets for bimanual hand activities

Insufficient diversity and annotations in existing datasets

Need for automatic 3D hand and object estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Markerless capture setup for 3D estimation

Automatic hand and object motion annotation

Large-scale diverse bimanual activity dataset

🔎 Similar Papers

The Collection of a Human Robot Collaboration Dataset for Cooperative Assembly in Glovebox Environments