Real-DRL: Teach and Learn in Reality

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

To address safety deficiencies, difficulty in modeling unknown risks, and significant Sim2Real gaps when deploying deep reinforcement learning (DRL) agents in safety-critical autonomous systems, this paper proposes Real-DRL—a novel framework for safe physical deployment. Real-DRL establishes a collaborative learning paradigm between a DRL-based Student agent and a physics-model-driven PHY-Teacher, introducing three key innovations: (1) a bidirectional “teaching–learning co-evolution” paradigm, (2) real-time safety-aware batch sampling, and (3) automated hierarchical learning. The framework tightly integrates DRL with model-based safety controllers, interaction-triggered adaptation, and safety-constrained batch sampling within a closed-loop architecture. Experimental validation on quadrupedal robots and inverted pendulum platforms demonstrates that Real-DRL maintains high safety and control performance under extreme conditions, significantly improving robustness and generalization across dynamic physical environments.

Technology Category

Application Category

📝 Abstract

This paper introduces the Real-DRL framework for safety-critical autonomous systems, enabling runtime learning of a deep reinforcement learning (DRL) agent to develop safe and high-performance action policies in real plants (i.e., real physical systems to be controlled), while prioritizing safety! The Real-DRL consists of three interactive components: a DRL-Student, a PHY-Teacher, and a Trigger. The DRL-Student is a DRL agent that innovates in the dual self-learning and teaching-to-learn paradigm and the real-time safety-informed batch sampling. On the other hand, PHY-Teacher is a physics-model-based design of action policies that focuses solely on safety-critical functions. PHY-Teacher is novel in its real-time patch for two key missions: i) fostering the teaching-to-learn paradigm for DRL-Student and ii) backing up the safety of real plants. The Trigger manages the interaction between the DRL-Student and the PHY-Teacher. Powered by the three interactive components, the Real-DRL can effectively address safety challenges that arise from the unknown unknowns and the Sim2Real gap. Additionally, Real-DRL notably features i) assured safety, ii) automatic hierarchy learning (i.e., safety-first learning and then high-performance learning), and iii) safety-informed batch sampling to address the learning experience imbalance caused by corner cases. Experiments with a real quadruped robot, a quadruped robot in NVIDIA Isaac Gym, and a cart-pole system, along with comparisons and ablation studies, demonstrate the Real-DRL's effectiveness and unique features.

Problem

Research questions and friction points this paper is trying to address.

Enabling safe runtime learning for autonomous systems in real plants

Addressing safety challenges from unknown unknowns and Sim2Real gap

Developing safety-critical action policies through interactive teaching components

Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive DRL-Student with dual learning paradigm

Physics-model-based PHY-Teacher for safety backup

Trigger-managed interaction addressing Sim2Real gap

🔎 Similar Papers

No similar papers found.

Field AI

Irvine, CA

Research Engineer - Training Large Behavior Models with Reinforcement Learning (EG16, f/m/div.)

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Robotic Control Policy (PhD)