🤖 AI Summary
In physical human-robot collaboration (pHRC), existing approaches suffer from inaccurate intent estimation due to reliance solely on short-term motion data, limited multi-step prediction capability, and inflexible role allocation. To address these challenges, this paper proposes a novel dual-branch Transformer-based conditional variational autoencoder (CVAE) that jointly models force and trajectory modalities, and—uniquely—embeds human biomechanical modeling into long-horizon collaborative prediction. Furthermore, it integrates differential cooperative game theory (DCGT) to enable intent-driven, real-time role reassignment. The method innovatively fuses human-guided multimodal sensing with obstacle-aware trajectory prediction. Experimental results demonstrate a 23.6% improvement in intent recognition accuracy, 94.1% role allocation rationality, and a 17.3% gain in collaboration efficiency over state-of-the-art methods, significantly enhancing robotic autonomy and dynamic adaptability.
📝 Abstract
In physical Human-Robot Collaboration (pHRC), accurate human intent estimation and rational human-robot role allocation are crucial for safe and efficient assistance. Existing methods that rely on short-term motion data for intention estimation lack multi-step prediction capabilities, hindering their ability to sense intent changes and adjust human-robot assignments autonomously, resulting in potential discrepancies. To address these issues, we propose a Dual Transformer-based Robot Trajectron (DTRT) featuring a hierarchical architecture, which harnesses human-guided motion and force data to rapidly capture human intent changes, enabling accurate trajectory predictions and dynamic robot behavior adjustments for effective collaboration. Specifically, human intent estimation in DTRT uses two Transformer-based Conditional Variational Autoencoders (CVAEs), incorporating robot motion data in obstacle-free case with human-guided trajectory and force for obstacle avoidance. Additionally, Differential Cooperative Game Theory (DCGT) is employed to synthesize predictions based on human-applied forces, ensuring robot behavior align with human intention. Compared to state-of-the-art (SOTA) methods, DTRT incorporates human dynamics into long-term prediction, providing an accurate understanding of intention and enabling rational role allocation, achieving robot autonomy and maneuverability. Experiments demonstrate DTRT's accurate intent estimation and superior collaboration performance.