MAGICS: Adversarial RL with Minimax Actors Guided by Implicit Critic Stackelberg for Convergent Neural Synthesis of Robot Safety

📅 2024-09-20
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing neural safety synthesis methods for high-dimensional robotic control lack convergence guarantees and interpretability, hindering simultaneous achievement of safety and scalability. This paper proposes an adversarial reinforcement learning framework based on implicit Critic Stackelberg games. Our method introduces the first local convergence guarantee for neural safety synthesis. We design a minimax Actor coupled with an implicit Critic architecture, enabling provably correct computation of minimax equilibria. Furthermore, we integrate neural Lyapunov functions with high-dimensional policy networks to jointly enhance closed-loop stability and representational capacity. Evaluated on OpenAI Gym benchmarks and real-world hardware experiments with a 36-degree-of-freedom quadrupedal robot, our approach achieves significantly improved robustness and safety—outperforming state-of-the-art neural safety synthesis methods across all key metrics.

Technology Category

Application Category

📝 Abstract
While robust optimal control theory provides a rigorous framework to compute robot control policies that are provably safe, it struggles to scale to high-dimensional problems, leading to increased use of deep learning for tractable synthesis of robot safety. Unfortunately, existing neural safety synthesis methods often lack convergence guarantees and solution interpretability. In this paper, we present Minimax Actors Guided by Implicit Critic Stackelberg (MAGICS), a novel adversarial reinforcement learning (RL) algorithm that guarantees local convergence to a minimax equilibrium solution. We then build on this approach to provide local convergence guarantees for a general deep RL-based robot safety synthesis algorithm. Through both simulation studies on OpenAI Gym environments and hardware experiments with a 36-dimensional quadruped robot, we show that MAGICS can yield robust control policies outperforming the state-of-the-art neural safety synthesis methods.
Problem

Research questions and friction points this paper is trying to address.

Scaling robust optimal control to high-dimensional robot safety problems
Lacking convergence guarantees in neural safety synthesis methods
Ensuring local convergence to minimax equilibrium in adversarial RL
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial RL with minimax equilibrium guarantees
Implicit Critic Stackelberg for neural safety synthesis
Convergent deep RL for robust robot control
🔎 Similar Papers
Justin Wang
Justin Wang
OpenAI, Carnegie Mellon University
Machine LearningDeep Learning
Haimin Hu
Haimin Hu
Incoming Assistant Professor, Johns Hopkins University, CS and DSAI
RoboticsHuman-Robot InteractionSafe LearningDynamic GamesTrustworthy Autonomy
D
Duy P. Nguyen
Department of Electrical and Computer Engineering, Princeton University, USA
J
J. F. Fisac
Department of Electrical and Computer Engineering, Princeton University, USA