🤖 AI Summary
This paper addresses the model-free stabilization of unknown linear time-invariant (LTI) systems, aiming to efficiently learn a stabilizing controller without requiring an initial stabilizing policy. We propose a two-stage approach: first, identifying the left unstable subspace of the system; second, solving a discounted linear quadratic regulator (LQR) problem exclusively within this low-dimensional subspace—thereby controlling only the unstable modes. By focusing stabilization efforts solely on the unstable subspace, our method drastically reduces the effective control dimension. Crucially, we establish the first non-asymptotic theoretical guarantee for such a scheme, improving the sample complexity from the conventional $O(n^2)$ to $O(r^2)$, where $r$ denotes the number of unstable modes and satisfies $r ll n$. Numerical experiments corroborate the enhanced stabilization efficiency and align with our theoretical predictions.
📝 Abstract
We study the problem of learning to stabilize (LTS) a linear time-invariant (LTI) system. Policy gradient (PG) methods for control assume access to an initial stabilizing policy. However, designing such a policy for an unknown system is one of the most fundamental problems in control, and it may be as hard as learning the optimal policy itself. Existing work on the LTS problem requires large data as it scales quadratically with the ambient dimension. We propose a two-phase approach that first learns the left unstable subspace of the system and then solves a series of discounted linear quadratic regulator (LQR) problems on the learned unstable subspace, targeting to stabilize only the system's unstable dynamics and reduce the effective dimension of the control space. We provide non-asymptotic guarantees for both phases and demonstrate that operating on the unstable subspace reduces sample complexity. In particular, when the number of unstable modes is much smaller than the state dimension, our analysis reveals that LTS on the unstable subspace substantially speeds up the stabilization process. Numerical experiments are provided to support this sample complexity reduction achieved by our approach.