Sample Complexity of Linear Quadratic Regulator Without Initial Stability

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This paper addresses the learning problem of linear quadratic regulators (LQR) with unknown parameters and unstable initial policies. To overcome limitations of existing approaches—which either require a stabilizing initial policy or rely on two-point gradient estimation—we propose a novel receding-horizon policy gradient algorithm. Our method is the first to eliminate the requirement of an initially stabilizing policy while retaining the $ ilde{O}(1/varepsilon^2)$ sample complexity. By introducing Riemannian distance contraction analysis on the Riccati manifold, we significantly strengthen convergence guarantees and improve sample efficiency. Theoretically, our algorithm achieves superior sample complexity compared to existing single-point gradient methods. Numerical experiments demonstrate rapid convergence even from highly unstable initial policies and confirm strong robustness against system disturbances and modeling errors.

Technology Category

Application Category

📝 Abstract

Inspired by REINFORCE, we introduce a novel receding-horizon algorithm for the Linear Quadratic Regulator (LQR) problem with unknown parameters. Unlike prior methods, our algorithm avoids reliance on two-point gradient estimates while maintaining the same order of sample complexity. Furthermore, it eliminates the restrictive requirement of starting with a stable initial policy, broadening its applicability. Beyond these improvements, we introduce a refined analysis of error propagation through the contraction of the Riemannian distance over the Riccati operator. This refinement leads to a better sample complexity and ensures improved convergence guarantees. Numerical simulations validate the theoretical results, demonstrating the method's practical feasibility and performance in realistic scenarios.

Problem

Research questions and friction points this paper is trying to address.

Sample complexity for LQR

No initial stability requirement

Improved convergence guarantees

Innovation

Methods, ideas, or system contributions that make the work stand out.

Receding-horizon algorithm introduced

Avoids two-point gradient estimates

No stable initial policy required

🔎 Similar Papers

Sample Complexity of the Linear Quadratic Regulator: A Reinforcement Learning Lens