🤖 AI Summary
Existing methods struggle to generate physically plausible and long-horizon dynamic human-object interaction (HOI) motions, primarily due to reliance on static interaction data and the absence of pre-trained models capable of modeling full-body dynamics. This work proposes a novel framework that integrates a pre-trained human motion diffusion prior with a static HOI imitation agent. During the planning phase, it generates enhanced dynamic HOI sequences; in the execution phase, a compositing network spatiotemporally fuses multiple policy-derived control signals. This approach achieves, for the first time, the generation of extended, dynamic HOI motions by introducing diffusion-based data augmentation and a modular controller with spatiotemporal fusion. It significantly improves interaction success rates and stability across diverse tasks while substantially reducing training time.
📝 Abstract
Generating physically plausible dynamic motions of human-object interaction (HOI) remains challenging, mainly due to existing HOI datasets limited to static interactions, and pretrained agents capable of either dynamic full-body motions without objects or static HOI motions. Recent works such as InsActor and CLoSD generate HOI motions in planning and execution stages, are yet limited to either static or short-term contacts e.g. striking. In this work, we propose a framework that fulfills dynamic and long-term interaction motions such as running while holding a table, by combining pretrained motion priors and imitation agents in planning and execution stages. In the planning stage, we augment HOI datasets with dynamic priors from a pretrained human motion diffusion model, followed by object trajectory generation. This plans dynamic HOI sequences. In the execution stage, a composer network blends actions of pretrained imitation agents specialized either for dynamic human motions or static HOI motions, enabling spatio-temporal composition of their complementary skills. Our method over relevant prior-arts consistently improves success rates while maintaining interaction for dynamic HOI tasks. Furthermore, blending pretrained experts with our composer achieves competitive performance in significantly reduced training time. Ablation studies validate the effectiveness of our augmentation and composer blending.