๐ค AI Summary
This work addresses the optimal control policy learning problem for partially observable near-linear quadratic regulator (NLQR) systems governed by hybrid dynamics with small Lipschitz-continuous nonlinear perturbations. We propose a policy gradient algorithm and, under a suitably designed initialization scheme, provide the first rigorous proof of its global linear convergence to the optimal policy. Our key theoretical contributions are: (i) establishing that the nonconvex control cost function exhibits local strong convexity and smoothness in a neighborhood of the global optimumโenabling convergence guarantees; and (ii) developing a tailored initialization mechanism and optimization landscape analysis framework grounded in this geometric property. To our knowledge, this is the first reinforcement learning method for partially observable nonlinear control systems with provable linear convergence, thereby filling a fundamental theoretical gap in the NLQR literature.
๐ Abstract
Nonlinear control systems with partial information to the decision maker are prevalent in a variety of applications. As a step toward studying such nonlinear systems, this work explores reinforcement learning methods for finding the optimal policy in the nearly linear-quadratic regulator systems. In particular, we consider a dynamic system that combines linear and nonlinear components, and is governed by a policy with the same structure. Assuming that the nonlinear component comprises kernels with small Lipschitz coefficients, we characterize the optimization landscape of the cost function. Although the cost function is nonconvex in general, we establish the local strong convexity and smoothness in the vicinity of the global optimizer. Additionally, we propose an initialization mechanism to leverage these properties. Building on the developments, we design a policy gradient algorithm that is guaranteed to converge to the globally optimal policy with a linear rate.