SHOW3D: Capturing Scenes of 3D Hands and Objects in the Wild

📅 2026-03-30

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

Existing hand–object interaction datasets are predominantly collected in controlled environments, limiting their ability to support model generalization in real-world, complex scenarios. To address this gap, this work proposes a lightweight, markerless multi-camera system integrated with a user-worn VR headset to synchronously capture high-fidelity 3D hand–object interaction data across diverse in-the-wild settings. By leveraging a backpack-mounted multi-camera array, synchronized calibration with the VR headset, and a novel ego-exo joint tracking pipeline, the study achieves the first large-scale acquisition and annotation of high-precision 3D hand–object interactions in authentic outdoor environments. The resulting dataset, SHOW3D—the first in-the-wild 3D hand–object interaction benchmark—effectively mitigates the longstanding trade-off between environmental realism and annotation accuracy, significantly enhancing model generalization across multiple downstream tasks.

Technology Category

Application Category

📝 Abstract

Accurate 3D understanding of human hands and objects during manipulation remains a significant challenge for egocentric computer vision. Existing hand-object interaction datasets are predominantly captured in controlled studio settings, which limits both environmental diversity and the ability of models trained on such data to generalize to real-world scenarios. To address this challenge, we introduce a novel marker-less multi-camera system that allows for nearly unconstrained mobility in genuinely in-the-wild conditions, while still having the ability to generate precise 3D annotations of hands and objects. The capture system consists of a lightweight, back-mounted, multi-camera rig that is synchronized and calibrated with a user-worn VR headset. For 3D ground-truth annotation of hands and objects, we develop an ego-exo tracking pipeline and rigorously evaluate its quality. Finally, we present SHOW3D, the first large-scale dataset with 3D annotations that show hands interacting with objects in diverse real-world environments, including outdoor settings. Our approach significantly reduces the fundamental trade-off between environmental realism and accuracy of 3D annotations, which we validate with experiments on several downstream tasks. show3d-dataset.github.io

Problem

Research questions and friction points this paper is trying to address.

3D hand-object interaction

egocentric vision

in-the-wild capture

3D annotation

real-world generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

marker-less multi-camera system

ego-exo tracking

3D hand-object interaction

in-the-wild capture

large-scale 3D dataset

🔎 Similar Papers

HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction

2024-06-10arXiv.orgCitations: 0

WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild

2024-09-18arXiv.orgCitations: 8