🤖 AI Summary
To address the challenge of safe, online learning and adaptive control for quadrupedal robots in dynamic off-road environments, this paper proposes a real-time perception–navigation–control closed-loop learning framework. Methodologically, it introduces a novel collaborative architecture comprising an HP-Student—a high-performance deep reinforcement learning (DRL) policy—and an HA-Teacher—a real-time, formally verifiable physics-based controller—where the former enables efficient policy optimization while the latter ensures formal safety guarantees and emergency intervention. The framework integrates real-time physical modeling, safety-verified control synthesis, and multi-agent coordination mechanisms. It is rigorously evaluated on both Unitree Go2 hardware and Isaac Gym simulation. Compared to state-of-the-art safety-aware DRL approaches, our method achieves a 32% improvement in off-road terrain adaptability and reduces task failure rate by 67%, significantly enhancing operational safety and cross-environment generalization capability.
📝 Abstract
This paper presents a runtime learning framework for quadruped robots, enabling them to learn and adapt safely in dynamic wild environments. The framework integrates sensing, navigation, and control, forming a closed-loop system for the robot. The core novelty of this framework lies in two interactive and complementary components within the control module: the high-performance (HP)-Student and the high-assurance (HA)-Teacher. HP-Student is a deep reinforcement learning (DRL) agent that engages in self-learning and teaching-to-learn to develop a safe and high-performance action policy. HA-Teacher is a simplified yet verifiable physics-model-based controller, with the role of teaching HP-Student about safety while providing a backup for the robot's safe locomotion. HA-Teacher is innovative due to its real-time physics model, real-time action policy, and real-time control goals, all tailored to respond effectively to real-time wild environments, ensuring safety. The framework also includes a coordinator who effectively manages the interaction between HP-Student and HA-Teacher. Experiments involving a Unitree Go2 robot in Nvidia Isaac Gym and comparisons with state-of-the-art safe DRLs demonstrate the effectiveness of the proposed runtime learning framework.