🤖 AI Summary
Existing research lacks high-quality 4D human-object-human collaborative rearrangement video data, hindering progress in VR and robotics for everyday assistive tasks. Method: We introduce the first large-scale 4D collaborative rearrangement dataset—comprising 1K real-world captures plus 11K geometry- and role-augmented sequences—covering diverse object geometries, collaborative interaction patterns, and 3D scene contexts. We propose an iterative collaborative retargeting strategy enabling, for the first time, motion generalization across object shapes and collaborative roles; establish a novel collaborative rearrangement modeling paradigm; and integrate geometric-aware motion retargeting, parametric object modeling, and collaborative relation modeling. Contribution/Results: Our method achieves state-of-the-art performance on two benchmarks—human-object motion prediction and interactive synthesis—and exposes critical limitations of mainstream HOI generation methods in complex interactions. We further release a more challenging HOI evaluation benchmark to foster future research.
📝 Abstract
Understanding how humans cooperatively rearrange household objects is critical for VR/AR and human-robot interaction. However, in-depth studies on modeling these behaviors are under-researched due to the lack of relevant datasets. We fill this gap by presenting CORE4D, a novel large-scale 4D human-object-human interaction dataset focusing on collaborative object rearrangement, which encompasses diverse compositions of various object geometries, collaboration modes, and 3D scenes. With 1K human-object-human motion sequences captured in the real world, we enrich CORE4D by contributing an iterative collaboration retargeting strategy to augment motions to a variety of novel objects. Leveraging this approach, CORE4D comprises a total of 11K collaboration sequences spanning 3K real and virtual object shapes. Benefiting from extensive motion patterns provided by CORE4D, we benchmark two tasks aiming at generating human-object interaction: human-object motion forecasting and interaction synthesis. Extensive experiments demonstrate the effectiveness of our collaboration retargeting strategy and indicate that CORE4D has posed new challenges to existing human-object interaction generation methodologies.