🤖 AI Summary
Enhancing the naturalness and compliance of human–robot interaction (HRI) remains challenging, particularly in achieving human-like multimodal coordination dynamics observed in human–human interaction (HHI).
Method: We propose an HMM-prior-driven variational autoencoder (HMM-VAE), the first to integrate a hidden Markov model as a structural prior over the VAE latent space for joint modeling of bidirectional interaction. Our framework jointly learns task-space reachability constraints and joint-space motion dynamics, and introduces HMM-segmentation-guided dynamic stiffness modulation for compliant contact control.
Contribution/Results: Trained solely on dyadic HHI data, the method generalizes effectively to diverse human partners without robot-specific retraining. Experiments on a humanoid robot demonstrate robust real-time performance. User studies confirm that generated motions are significantly more human-like, exhibit faster and more accurate responsiveness, and achieve substantially higher subjective preference ratings compared to baseline approaches.
📝 Abstract
This article presents a method for learning well-coordinated Human-Robot Interaction (HRI) from Human-Human Interactions (HHI). We devise a hybrid approach using Hidden Markov Models (HMMs) as the latent space priors for a Variational Autoencoder to model a joint distribution over the interacting agents. We leverage the interaction dynamics learned from HHI to learn HRI and incorporate the conditional generation of robot motions from human observations into the training, thereby predicting more accurate robot trajectories. The generated robot motions are further adapted with Inverse Kinematics to ensure the desired physical proximity with a human, combining the ease of joint space learning and accurate task space reachability. For contact-rich interactions, we modulate the robot's stiffness using HMM segmentation for a compliant interaction. We verify the effectiveness of our approach deployed on a Humanoid robot via a user study. Our method generalizes well to various humans despite being trained on data from just two humans. We find that Users perceive our method as more human-like, timely, and accurate and rank our method with a higher degree of preference over other baselines.