Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators

๐Ÿ“… 2023-03-15
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 5
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the optimal control policy learning problem for partially observable near-linear quadratic regulator (NLQR) systems governed by hybrid dynamics with small Lipschitz-continuous nonlinear perturbations. We propose a policy gradient algorithm and, under a suitably designed initialization scheme, provide the first rigorous proof of its global linear convergence to the optimal policy. Our key theoretical contributions are: (i) establishing that the nonconvex control cost function exhibits local strong convexity and smoothness in a neighborhood of the global optimumโ€”enabling convergence guarantees; and (ii) developing a tailored initialization mechanism and optimization landscape analysis framework grounded in this geometric property. To our knowledge, this is the first reinforcement learning method for partially observable nonlinear control systems with provable linear convergence, thereby filling a fundamental theoretical gap in the NLQR literature.
๐Ÿ“ Abstract
Nonlinear control systems with partial information to the decision maker are prevalent in a variety of applications. As a step toward studying such nonlinear systems, this work explores reinforcement learning methods for finding the optimal policy in the nearly linear-quadratic regulator systems. In particular, we consider a dynamic system that combines linear and nonlinear components, and is governed by a policy with the same structure. Assuming that the nonlinear component comprises kernels with small Lipschitz coefficients, we characterize the optimization landscape of the cost function. Although the cost function is nonconvex in general, we establish the local strong convexity and smoothness in the vicinity of the global optimizer. Additionally, we propose an initialization mechanism to leverage these properties. Building on the developments, we design a policy gradient algorithm that is guaranteed to converge to the globally optimal policy with a linear rate.
Problem

Research questions and friction points this paper is trying to address.

Finding optimal policy in nearly linear-quadratic regulator systems
Analyzing optimization landscape for nonlinear control systems
Ensuring policy gradient convergence to global optimum
Innovation

Methods, ideas, or system contributions that make the work stand out.

Policy gradient for nearly linear-quadratic regulators
Local strong convexity near global optimizer
Linear-rate convergence to optimal policy
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yin-Huan Han
Daniel J. Epstein Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, CA 90089, USA
Meisam Razaviyayn
Meisam Razaviyayn
University of Southern California
OptimizationMachine Learning
Renyuan Xu
Renyuan Xu
Stanford University
Mathematical FinanceStochastic AnalysisGenerative AIReinforcement LearningGame Theory