Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators

📅 2023-03-15

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the optimal control policy learning problem for partially observable near-linear quadratic regulator (NLQR) systems governed by hybrid dynamics with small Lipschitz-continuous nonlinear perturbations. We propose a policy gradient algorithm and, under a suitably designed initialization scheme, provide the first rigorous proof of its global linear convergence to the optimal policy. Our key theoretical contributions are: (i) establishing that the nonconvex control cost function exhibits local strong convexity and smoothness in a neighborhood of the global optimum—enabling convergence guarantees; and (ii) developing a tailored initialization mechanism and optimization landscape analysis framework grounded in this geometric property. To our knowledge, this is the first reinforcement learning method for partially observable nonlinear control systems with provable linear convergence, thereby filling a fundamental theoretical gap in the NLQR literature.

📝 Abstract

Nonlinear control systems with partial information to the decision maker are prevalent in a variety of applications. As a step toward studying such nonlinear systems, this work explores reinforcement learning methods for finding the optimal policy in the nearly linear-quadratic regulator systems. In particular, we consider a dynamic system that combines linear and nonlinear components, and is governed by a policy with the same structure. Assuming that the nonlinear component comprises kernels with small Lipschitz coefficients, we characterize the optimization landscape of the cost function. Although the cost function is nonconvex in general, we establish the local strong convexity and smoothness in the vicinity of the global optimizer. Additionally, we propose an initialization mechanism to leverage these properties. Building on the developments, we design a policy gradient algorithm that is guaranteed to converge to the globally optimal policy with a linear rate.

Problem

Research questions and friction points this paper is trying to address.

Finding optimal policy in nearly linear-quadratic regulator systems

Analyzing optimization landscape for nonlinear control systems

Ensuring policy gradient convergence to global optimum

Innovation

Methods, ideas, or system contributions that make the work stand out.

Policy gradient for nearly linear-quadratic regulators

Local strong convexity near global optimizer

Linear-rate convergence to optimal policy

🔎 Similar Papers

No similar papers found.