🤖 AI Summary
To address the poor generalizability of morphology-specific control policies for legged robots, this paper proposes a two-stage teacher–student framework. First, dedicated reinforcement learning policies (teachers) are trained individually for five distinct legged morphologies. Second, knowledge from these teachers is distilled into a morphology-agnostic student policy implemented as a Transformer architecture. The student leverages self-attention to model inter-joint dynamical relationships across morphologies, enabling strong zero-shot generalization to unseen configurations. Experiments show that the student achieves 94.47% of teacher performance on trained morphologies and maintains 72.64% on entirely unseen ones. Moreover, the student policy is successfully deployed on a real quadrupedal robot. This work represents the first application of Transformers to cross-morphology policy distillation in legged robotics, significantly advancing the performance and generalizability of universal locomotion controllers.
📝 Abstract
Developing controllers that generalize across diverse robot morphologies remains a significant challenge in legged locomotion. Traditional approaches either create specialized controllers for each morphology or compromise performance for generality. This paper introduces a two-stage teacher-student framework that bridges this gap through policy distillation. First, we train specialized teacher policies optimized for individual morphologies, capturing the unique optimal control strategies for each robot design. Then, we distill this specialized expertise into a single Transformer-based student policy capable of controlling robots with varying leg configurations. Our experiments across five distinct legged morphologies demonstrate that our approach preserves morphology-specific optimal behaviors, with the Transformer architecture achieving 94.47% of teacher performance on training morphologies and 72.64% on unseen robot designs. Comparative analysis reveals that Transformer-based architectures consistently outperform MLP baselines by leveraging attention mechanisms to effectively model joint relationships across different kinematic structures. We validate our approach through successful deployment on a physical quadruped robot, demonstrating the practical viability of our morphology-agnostic control framework. This work presents a scalable solution for developing universal legged robot controllers that maintain near-optimal performance while generalizing across diverse morphologies.