🤖 AI Summary
Existing reinforcement learning approaches for quadrupedal robots suffer from poor generalization across morphologically distinct platforms, necessitating hyperparameter re-tuning and suffering performance degradation during transfer. To address this, we propose Morphology-aware Conditional Actor–Critic Reinforcement Learning (McARL), a morphology-aware RL framework that embeds stochastically sampled morphology vectors into both policy and value networks, enabling morphology-conditioned control policies. McARL achieves zero-shot cross-platform transfer for the first time and quantifies, for the first time, the correlation between morphological distance and transfer performance—enhancing robustness. Built upon PPO, McARL is validated across multiple simulation environments and real-world platforms (Go1, Go2, A1, Mini Cheetah). A single learned policy achieves 6.0 m/s on Go1 and zero-shot transfers to Go2 at 3.5 m/s. Cross-morphology performance improves by 44–150% over baseline PPO variants.
📝 Abstract
We present Morphology-Control-Aware Reinforcement Learning (McARL), a new approach to overcome challenges of hyperparameter tuning and transfer loss, enabling generalizable locomotion across robot morphologies. We use a morphology-conditioned policy by incorporating a randomized morphology vector, sampled from a defined morphology range, into both the actor and critic networks. This allows the policy to learn parameters that generalize to robots with similar characteristics. We demonstrate that a single policy trained on a Unitree Go1 robot using McARL can be transferred to a different morphology (e.g., Unitree Go2 robot) and can achieve zero-shot transfer velocity of up to 3.5 m/s without retraining or fine-tuning. Moreover, it achieves 6.0 m/s on the training Go1 robot and generalizes to other morphologies like A1 and Mini Cheetah. We also analyze the impact of morphology distance on transfer performance and highlight McARL's advantages over prior approaches. McARL achieves 44-150% higher transfer performance on Go2, Mini Cheetah, and A1 compared to PPO variants.