Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Verifying the stability of learning-based control policies—particularly those derived from reinforcement learning (RL)—remains challenging, as classical Lyapunov methods struggle to construct strictly decreasing analytical certificates. To address this, we propose a learnable generalized Lyapunov verification framework. Our method extends LQR intuition to nonlinear RL settings by defining a multi-step average decrease condition for generalized Lyapunov functions. We augment the RL value function with a neural-network-parameterized residual term, enabling end-to-end joint training of both the controller and its stability certificate. This approach significantly enlarges the certified region of attraction for approximate closed-loop stability. We validate our framework on standard benchmarks—including Gymnasium and DeepMind Control—demonstrating stability certification for multiple RL policies. Our work bridges the gap between modern learning-based control and classical stability theory, offering a scalable, data-driven paradigm for stability verification.

Technology Category

Application Category

📝 Abstract

We study the problem of certifying the stability of closed-loop systems under control policies derived from optimal control or reinforcement learning (RL). Classical Lyapunov methods require a strict step-wise decrease in the Lyapunov function but such a certificate is difficult to construct for a learned control policy. The value function associated with an RL policy is a natural Lyapunov function candidate but it is not clear how it should be modified. To gain intuition, we first study the linear quadratic regulator (LQR) problem and make two key observations. First, a Lyapunov function can be obtained from the value function of an LQR policy by augmenting it with a residual term related to the system dynamics and stage cost. Second, the classical Lyapunov decrease requirement can be relaxed to a generalized Lyapunov condition requiring only decrease on average over multiple time steps. Using this intuition, we consider the nonlinear setting and formulate an approach to learn generalized Lyapunov functions by augmenting RL value functions with neural network residual terms. Our approach successfully certifies the stability of RL policies trained on Gymnasium and DeepMind Control benchmarks. We also extend our method to jointly train neural controllers and stability certificates using a multi-step Lyapunov loss, resulting in larger certified inner approximations of the region of attraction compared to the classical Lyapunov approach. Overall, our formulation enables stability certification for a broad class of systems with learned policies by making certificates easier to construct, thereby bridging classical control theory and modern learning-based methods.

Problem

Research questions and friction points this paper is trying to address.

Certifying stability of RL policies using generalized Lyapunov functions

Relaxing classical Lyapunov conditions for learned control policies

Jointly training neural controllers and stability certificates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Augmenting RL value functions with neural network residuals

Relaxing Lyapunov decrease to multi-step average condition

Joint training of neural controllers and stability certificates

🔎 Similar Papers

On Robust Reinforcement Learning with Lipschitz-Bounded Policy Networks