🤖 AI Summary
This work addresses the challenge of enabling morphologically dissimilar agents to learn full-body physical interaction skills—such as dancing, handshaking, and martial arts—from human demonstrations. We propose the Embedded Interaction Graph (EIG), a compact, transferable spatiotemporal representation of interaction dynamics, which enables cross-morphology, target-free, and alignment-free physical interaction imitation for the first time. Our method integrates graph neural networks with physics-based simulation, using the EIG as a unified imitation objective for policy learning. Experiments across diverse robotic platforms demonstrate successful reproduction of complex interactive tasks—including dancing, rock-paper-scissors, and handshaking—while preserving both semantic motion fidelity and physical feasibility. The approach significantly extends the applicability boundary of imitation learning to heterogeneous embodied agents, overcoming longstanding limitations in morphological generalization, manual task specification, and inter-agent kinematic alignment.
📝 Abstract
Learning physical interaction skills, such as dancing, handshaking, or sparring, remains a fundamental challenge for agents operating in human environments, particularly when the agent's morphology differs significantly from that of the demonstrator. Existing approaches often rely on handcrafted objectives or morphological similarity, limiting their capacity for generalization. Here, we introduce a framework that enables agents with diverse embodiments to learn wholebbody interaction behaviors directly from human demonstrations. The framework extracts a compact, transferable representation of interaction dynamics, called the Embedded Interaction Graph (EIG), which captures key spatiotemporal relationships between the interacting agents. This graph is then used as an imitation objective to train control policies in physics-based simulations, allowing the agent to generate motions that are both semantically meaningful and physically feasible. We demonstrate BuddyImitation on multiple agents, such as humans, quadrupedal robots with manipulators, or mobile manipulators and various interaction scenarios, including sparring, handshaking, rock-paper-scissors, or dancing. Our results demonstrate a promising path toward coordinated behaviors across morphologically distinct characters via cross embodiment interaction learning.