Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(sqrt{T})$ Regret

📅 2024-05-29

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

Existing online learning methods for the Linear Quadratic Regulator (LQR) rely on strong assumptions—particularly global excitability—to achieve theoretical guarantees, severely limiting their applicability. Method: This paper proposes a novel approximate Thompson sampling algorithm that innovatively integrates preconditioned Langevin dynamics with an adaptive excitation mechanism. Crucially, it operates without assuming system excitability or other restrictive identifiability conditions. Contribution/Results: The method establishes, for the first time without such assumptions, nontrivial concentration of the approximate posterior distribution. This enables a tight Bayesian regret analysis, yielding an $ ilde{O}(sqrt{T})$ Bayesian regret upper bound—significantly improving upon prior approaches requiring strong assumptions. By unifying perspectives from Bayesian reinforcement learning, stochastic control, and system identification, this work provides a more general and robust theoretical framework and algorithmic paradigm for online LQR learning.

Technology Category

Application Category

📝 Abstract

We propose a novel Thompson sampling algorithm that learns linear quadratic regulators (LQR) with a Bayesian regret bound of $O(sqrt{T})$. Our method leverages Langevin dynamics with a carefully designed preconditioner and incorporates a simple excitation mechanism. We show that the excitation signal drives the minimum eigenvalue of the preconditioner to grow over time, thereby accelerating the approximate posterior sampling process. Furthermore, we establish nontrivial concentration properties of the approximate posteriors generated by our algorithm. These properties enable us to bound the moments of the system state and attain an $O(sqrt{T})$ regret bound without relying on the restrictive assumptions that are often used in the literature.

Problem

Research questions and friction points this paper is trying to address.

Develop Thompson sampling for linear quadratic regulators

Achieve O(sqrt(T)) regret with Langevin dynamics

Ensure posterior concentration without restrictive assumptions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel Thompson sampling for LQR

Langevin dynamics with preconditioner

Excitation mechanism accelerates sampling

🔎 Similar Papers

Sample Complexity of the Linear Quadratic Regulator: A Reinforcement Learning Lens