🤖 AI Summary
To address safety deficiencies, difficulty in modeling unknown risks, and significant Sim2Real gaps when deploying deep reinforcement learning (DRL) agents in safety-critical autonomous systems, this paper proposes Real-DRL—a novel framework for safe physical deployment. Real-DRL establishes a collaborative learning paradigm between a DRL-based Student agent and a physics-model-driven PHY-Teacher, introducing three key innovations: (1) a bidirectional “teaching–learning co-evolution” paradigm, (2) real-time safety-aware batch sampling, and (3) automated hierarchical learning. The framework tightly integrates DRL with model-based safety controllers, interaction-triggered adaptation, and safety-constrained batch sampling within a closed-loop architecture. Experimental validation on quadrupedal robots and inverted pendulum platforms demonstrates that Real-DRL maintains high safety and control performance under extreme conditions, significantly improving robustness and generalization across dynamic physical environments.
📝 Abstract
This paper introduces the Real-DRL framework for safety-critical autonomous systems, enabling runtime learning of a deep reinforcement learning (DRL) agent to develop safe and high-performance action policies in real plants (i.e., real physical systems to be controlled), while prioritizing safety! The Real-DRL consists of three interactive components: a DRL-Student, a PHY-Teacher, and a Trigger. The DRL-Student is a DRL agent that innovates in the dual self-learning and teaching-to-learn paradigm and the real-time safety-informed batch sampling. On the other hand, PHY-Teacher is a physics-model-based design of action policies that focuses solely on safety-critical functions. PHY-Teacher is novel in its real-time patch for two key missions: i) fostering the teaching-to-learn paradigm for DRL-Student and ii) backing up the safety of real plants. The Trigger manages the interaction between the DRL-Student and the PHY-Teacher. Powered by the three interactive components, the Real-DRL can effectively address safety challenges that arise from the unknown unknowns and the Sim2Real gap. Additionally, Real-DRL notably features i) assured safety, ii) automatic hierarchy learning (i.e., safety-first learning and then high-performance learning), and iii) safety-informed batch sampling to address the learning experience imbalance caused by corner cases. Experiments with a real quadruped robot, a quadruped robot in NVIDIA Isaac Gym, and a cart-pole system, along with comparisons and ablation studies, demonstrate the Real-DRL's effectiveness and unique features.