🤖 AI Summary
This work addresses the challenge of computationally constrained real-time omnidirectional locomotion learning for octopod robots. We propose CrossQ, a lightweight off-policy reinforcement learning algorithm, and present the first successful on-robot deployment enabling end-to-end online training. Our method integrates joint-target-position predictive control with a central pattern generator (CPG) to establish an embedded-real-time training framework. Experiments demonstrate robust omnidirectional walking acquisition within just 8 minutes of real-robot interaction—accelerating learning by over an order of magnitude compared to prior approaches—and achieve high-speed, agile, and naturally stable gait adaptation across diverse indoor and outdoor terrains. Key contributions include: (1) the first on-robot RL system tailored for omnidirectional locomotion; (2) CrossQ’s design achieving high sample efficiency and low computational overhead; and (3) a full software–hardware co-designed architecture supporting real-time embedded training.
📝 Abstract
On-robot Reinforcement Learning is a promising approach to train embodiment-aware policies for legged robots. However, the computational constraints of real-time learning on robots pose a significant challenge. We present a framework for efficiently learning quadruped locomotion in just 8 minutes of raw real-time training utilizing the sample efficiency and minimal computational overhead of the new off-policy algorithm CrossQ. We investigate two control architectures: Predicting joint target positions for agile, high-speed locomotion and Central Pattern Generators for stable, natural gaits. While prior work focused on learning simple forward gaits, our framework extends on-robot learning to omnidirectional locomotion. We demonstrate the robustness of our approach in different indoor and outdoor environments.