Hoi! - A Multimodal Dataset for Force-Grounded, Cross-View Articulated Manipulation

📅 2025-12-04

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the lack of force perception modeling and cross-view transfer research in articulated object manipulation, this paper introduces HOI-Force—the first force-grounded, multi-view synchronized multimodal manipulation dataset. HOI-Force comprises 3,048 manipulation sequences across 381 objects and 38 environments, enabling high-precision temporal synchronization of visual, six-degree-of-freedom (6-DoF) force, and tactile signals across four distinct manipulation modalities: human hands, wrist-mounted cameras, UMI robotic grippers, and a custom Hoi! gripper. Its core contributions are threefold: (1) incorporation of real-world force annotations spanning diverse embodiment modalities; (2) support for force prediction, cross-view imitation learning, and joint visuo-tactile-force representation learning; and (3) establishment of the largest and most modally comprehensive benchmark to date for articulated object manipulation. The dataset and code are publicly released.

Technology Category

Application Category

📝 Abstract

We present a dataset for force-grounded, cross-view articulated manipulation that couples what is seen with what is done and what is felt during real human interaction. The dataset contains 3048 sequences across 381 articulated objects in 38 environments. Each object is operated under four embodiments - (i) human hand, (ii) human hand with a wrist-mounted camera, (iii) handheld UMI gripper, and (iv) a custom Hoi! gripper - where the tool embodiment provides synchronized end-effector forces and tactile sensing. Our dataset offers a holistic view of interaction understanding from video, enabling researchers to evaluate how well methods transfer between human and robotic viewpoints, but also investigate underexplored modalities such as force sensing and prediction.

Problem

Research questions and friction points this paper is trying to address.

Develops a multimodal dataset for force-grounded articulated manipulation

Enables cross-view transfer between human and robotic interactions

Investigates force sensing and prediction in manipulation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal dataset for force-grounded manipulation

Four embodiments with synchronized force and tactile sensing

Enables cross-view transfer and force modality exploration

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey