HUMOTO: A 4D Dataset of Mocap Human Object Interactions

📅 2025-04-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address key bottlenecks in human-object interaction (HOI) datasets—including physical implausibility, task-logical discontinuity, and severe occlusion—this paper introduces HUMOTO, the first high-fidelity 4D dataset featuring synchronized full-body human and multi-object interactions. Methodologically, HUMOTO integrates: (1) a scene-driven LLM-based script generation paradigm to ensure semantic coherence and natural temporal progression; (2) hybrid optical motion capture with a 16-view synchronized video system, coupled with high-precision 3D object modeling (63 categories) and 72-DOF articulated human modeling, significantly mitigating occlusion and interpenetration; and (3) expert-guided post-processing to enhance physical plausibility. The dataset comprises 736 real-world action sequences (7,875 seconds) spanning diverse tasks such as cooking and picnicking. It supports three core research directions—motion generation, visual understanding, and robot learning—and sets new state-of-the-art benchmarks in physical consistency, task coherence, and application breadth.

Technology Category

Application Category

📝 Abstract
We present Human Motions with Objects (HUMOTO), a high-fidelity dataset of human-object interactions for motion generation, computer vision, and robotics applications. Featuring 736 sequences (7,875 seconds at 30 fps), HUMOTO captures interactions with 63 precisely modeled objects and 72 articulated parts. Our innovations include a scene-driven LLM scripting pipeline creating complete, purposeful tasks with natural progression, and a mocap-and-camera recording setup to effectively handle occlusions. Spanning diverse activities from cooking to outdoor picnics, HUMOTO preserves both physical accuracy and logical task flow. Professional artists rigorously clean and verify each sequence, minimizing foot sliding and object penetrations. We also provide benchmarks compared to other datasets. HUMOTO's comprehensive full-body motion and simultaneous multi-object interactions address key data-capturing challenges and provide opportunities to advance realistic human-object interaction modeling across research domains with practical applications in animation, robotics, and embodied AI systems. Project: https://jiaxin-lu.github.io/humoto/ .
Problem

Research questions and friction points this paper is trying to address.

Creating a high-fidelity dataset for human-object interaction modeling
Addressing occlusion challenges in mocap and camera recordings
Ensuring physical accuracy and logical task flow in interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scene-driven LLM scripting pipeline
Mocap-and-camera occlusion handling setup
Professional artist-cleaned motion sequences
🔎 Similar Papers
No similar papers found.