🤖 AI Summary
This work addresses the problem of adaptive control for stochastic linear quadratic regulators (LQR) with time-varying chance constraints. The authors propose a safe, optimism-based exploration method formulated via semidefinite programming (SDP), which selects optimistic policies while progressively retracting to verifiably safe ones, thereby satisfying safety constraints at every step while achieving low regret. The key innovation lies in establishing, for the first time in constrained LQR settings, a regret bound of $\tilde{O}(\sqrt{T})$, improving upon the prior best-known rate of $\tilde{O}(T^{2/3})$. The approach handles unbounded process noise through chance constraints and introduces a novel analytical framework based on system covariance—replacing conventional cost-function-based analyses—to theoretically guarantee both safety and near-optimal performance.
📝 Abstract
We study the problem of adaptive control of the stochastic linear quadratic regulator (LQR) with constraints that must be satisfied at every time step. Prior work on the multidimensional problem has shown $\tilde{O}(T^{2/3})$ regret and satisfaction of robust constraints, leaving open the question of whether $\tilde{O}(\sqrt{T})$ regret can be attained in the constrained LQR setting. We contribute to this problem by showing $\tilde{O}(\sqrt{T})$ regret and satisfaction of chance constraints. This type of constraints allow us to handle unbounded noise and also enable analytical techniques not directly applicable to robust constraints. Our proposed algorithm for this problem uses an SDP to select an optimistic policy, and then "scales back" this policy until it is verifiably-safe. Our theoretical analysis establishes regret and constraint guarantees via a key lemma that bounds the system covariance in terms of the chosen policy. This covariance-based analysis is in contrast with the cost-to-go based analysis that is typically used in adaptive LQR.