On Robust Reinforcement Learning with Lipschitz-Bounded Policy Networks

📅 2024-05-19
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the insufficient robustness of deep reinforcement learning (DRL) policy networks against disturbances, noise, and adversarial attacks. To enhance robustness, we propose an explicit Lipschitz-constrained policy parameterization. To balance expressivity and constraint tightness, we introduce a novel Sandwich layer architecture—replacing conventional spectral normalization—that substantially reduces the Lipschitz constant while preserving original task performance on benchmark environments (Pendulum swing-up and Atari Pong). Theoretical analysis establishes that a smaller Lipschitz bound directly implies stronger robustness to input perturbations and adversarial attacks. Empirical evaluation confirms that policies trained with the Sandwich layer maintain clean-environment performance comparable to unconstrained baselines while achieving significantly superior robustness under adversarial conditions. This work provides a new paradigm for verifiable, provably robust policy learning—enabling formal guarantees on policy stability without sacrificing task efficacy.

Technology Category

Application Category

📝 Abstract
This paper presents a study of robust policy networks in deep reinforcement learning. We investigate the benefits of policy parameterizations that naturally satisfy constraints on their Lipschitz bound, analyzing their empirical performance and robustness on two representative problems: pendulum swing-up and Atari Pong. We illustrate that policy networks with smaller Lipschitz bounds are more robust to disturbances, random noise, and targeted adversarial attacks than unconstrained policies composed of vanilla multi-layer perceptrons or convolutional neural networks. However, the structure of the Lipschitz layer is important. We find that the widely-used method of spectral normalization is too conservative and severely impacts clean performance, whereas more expressive Lipschitz layers such as the recently-proposed Sandwich layer can achieve improved robustness without sacrificing clean performance.
Problem

Research questions and friction points this paper is trying to address.

Robust policy networks in reinforcement learning
Lipschitz-bound constraints enhance robustness
Spectral normalization versus expressive Lipschitz layers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lipschitz-bounded policy networks
Improved robustness to disturbances
Sandwich layer enhances performance
N
Nicholas H. Barbara
Australian Centre for Robotics, School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, Sydney, NSW 2006, Australia
R
Ruigang Wang
Australian Centre for Robotics, School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, Sydney, NSW 2006, Australia
I
I. Manchester
Australian Centre for Robotics, School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, Sydney, NSW 2006, Australia