InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This work addresses key challenges in physics-driven human-object interaction (HOI) simulation—namely, inaccurate contact annotations in motion-capture (MoCap) data, insufficient hand-level detail, and limited object geometric diversity. We propose a curriculum-based teacher-student distillation framework: first, an individualized teacher policy refines raw MoCap sequences with contact-aware post-processing; second, knowledge is distilled into a generalizable student policy via online expert supervision; third, reinforcement learning fine-tuning enhances physical plausibility and dynamic quality beyond pure imitation. Our method requires only a few hours of imperfect MoCap data to learn diverse, high-fidelity whole-body HOI skills. It achieves zero-shot generalization across multiple HOI benchmarks, producing physically consistent and visually realistic motions. Moreover, it integrates seamlessly into existing motion generation pipelines, advancing HOI modeling from imitation learning toward controllable, physics-aware generation.

Technology Category

Application Category

📝 Abstract

Achieving realistic simulations of humans interacting with a wide range of objects has long been a fundamental goal. Extending physics-based motion imitation to complex human-object interactions (HOIs) is challenging due to intricate human-object coupling, variability in object geometries, and artifacts in motion capture data, such as inaccurate contacts and limited hand detail. We introduce InterMimic, a framework that enables a single policy to robustly learn from hours of imperfect MoCap data covering diverse full-body interactions with dynamic and varied objects. Our key insight is to employ a curriculum strategy -- perfect first, then scale up. We first train subject-specific teacher policies to mimic, retarget, and refine motion capture data. Next, we distill these teachers into a student policy, with the teachers acting as online experts providing direct supervision, as well as high-quality references. Notably, we incorporate RL fine-tuning on the student policy to surpass mere demonstration replication and achieve higher-quality solutions. Our experiments demonstrate that InterMimic produces realistic and diverse interactions across multiple HOI datasets. The learned policy generalizes in a zero-shot manner and seamlessly integrates with kinematic generators, elevating the framework from mere imitation to generative modeling of complex human-object interactions.

Problem

Research questions and friction points this paper is trying to address.

Universal whole-body control

Physics-based human-object interactions

Learning from imperfect MoCap data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum strategy for training

Distillation of teacher policies

RL fine-tuning for quality enhancement

🔎 Similar Papers

Self Model for Embodied Intelligence: Modeling Full-Body Human Musculoskeletal System and Locomotion Control with Hierarchical Low-Dimensional Representation

2023-12-09IEEE International Conference on Robotics and AutomationCitations: 4