🤖 AI Summary
This work addresses the insufficient robustness of deep reinforcement learning (DRL) policy networks against disturbances, noise, and adversarial attacks. To enhance robustness, we propose an explicit Lipschitz-constrained policy parameterization. To balance expressivity and constraint tightness, we introduce a novel Sandwich layer architecture—replacing conventional spectral normalization—that substantially reduces the Lipschitz constant while preserving original task performance on benchmark environments (Pendulum swing-up and Atari Pong). Theoretical analysis establishes that a smaller Lipschitz bound directly implies stronger robustness to input perturbations and adversarial attacks. Empirical evaluation confirms that policies trained with the Sandwich layer maintain clean-environment performance comparable to unconstrained baselines while achieving significantly superior robustness under adversarial conditions. This work provides a new paradigm for verifiable, provably robust policy learning—enabling formal guarantees on policy stability without sacrificing task efficacy.
📝 Abstract
This paper presents a study of robust policy networks in deep reinforcement learning. We investigate the benefits of policy parameterizations that naturally satisfy constraints on their Lipschitz bound, analyzing their empirical performance and robustness on two representative problems: pendulum swing-up and Atari Pong. We illustrate that policy networks with smaller Lipschitz bounds are more robust to disturbances, random noise, and targeted adversarial attacks than unconstrained policies composed of vanilla multi-layer perceptrons or convolutional neural networks. However, the structure of the Lipschitz layer is important. We find that the widely-used method of spectral normalization is too conservative and severely impacts clean performance, whereas more expressive Lipschitz layers such as the recently-proposed Sandwich layer can achieve improved robustness without sacrificing clean performance.