Unified Locomotion Transformer with Simultaneous Sim-to-Real Transfer for Quadrupeds

📅 2025-03-12

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Existing quadruped locomotion policies heavily rely on multi-stage knowledge distillation (e.g., teacher–student frameworks), requiring pre-trained teacher models and privileged information—resulting in low efficiency and poor transferability. Method: We propose the Unified Locomotion Transformer (ULT), the first single-stage, end-to-end Transformer architecture that jointly integrates privileged-information-guided policy learning, deep reinforcement learning, self-supervised next-state–action prediction, and behavioral cloning. ULT eliminates sequential distillation, instead co-optimizing a high-performance teacher policy and a lightweight student policy within one unified training objective. Contribution/Results: ULT enables zero-shot sim-to-real deployment without fine-tuning. Experiments demonstrate significantly reduced cross-domain transfer difficulty on complex terrains, achieving state-of-the-art generalization performance in both simulation and on real quadruped robots.

Technology Category

Application Category

📝 Abstract

Quadrupeds have gained rapid advancement in their capability of traversing across complex terrains. The adoption of deep Reinforcement Learning (RL), transformers and various knowledge transfer techniques can greatly reduce the sim-to-real gap. However, the classical teacher-student framework commonly used in existing locomotion policies requires a pre-trained teacher and leverages the privilege information to guide the student policy. With the implementation of large-scale models in robotics controllers, especially transformers-based ones, this knowledge distillation technique starts to show its weakness in efficiency, due to the requirement of multiple supervised stages. In this paper, we propose Unified Locomotion Transformer (ULT), a new transformer-based framework to unify the processes of knowledge transfer and policy optimization in a single network while still taking advantage of privilege information. The policies are optimized with reinforcement learning, next state-action prediction, and action imitation, all in just one training stage, to achieve zero-shot deployment. Evaluation results demonstrate that with ULT, optimal teacher and student policies can be obtained at the same time, greatly easing the difficulty in knowledge transfer, even with complex transformer-based models.

Problem

Research questions and friction points this paper is trying to address.

Reduces sim-to-real gap in quadruped locomotion.

Unifies knowledge transfer and policy optimization.

Achieves zero-shot deployment with single training stage.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified transformer for knowledge transfer and policy optimization

Single-stage training with reinforcement learning and imitation

Zero-shot deployment for complex transformer-based models

🔎 Similar Papers

Masked Sensory-Temporal Attention for Sensor Generalization in Quadruped Locomotion

2024-09-05arXiv.orgCitations: 0

Field AI

Irvine, CA

Master Thesis Bridging the Gap between Reinforcement Learning & E2E Driving

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Robotic Control Policy (PhD)