Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(sqrt{T})$ Regret

📅 2024-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing online learning methods for the Linear Quadratic Regulator (LQR) rely on strong assumptions—particularly global excitability—to achieve theoretical guarantees, severely limiting their applicability. Method: This paper proposes a novel approximate Thompson sampling algorithm that innovatively integrates preconditioned Langevin dynamics with an adaptive excitation mechanism. Crucially, it operates without assuming system excitability or other restrictive identifiability conditions. Contribution/Results: The method establishes, for the first time without such assumptions, nontrivial concentration of the approximate posterior distribution. This enables a tight Bayesian regret analysis, yielding an $ ilde{O}(sqrt{T})$ Bayesian regret upper bound—significantly improving upon prior approaches requiring strong assumptions. By unifying perspectives from Bayesian reinforcement learning, stochastic control, and system identification, this work provides a more general and robust theoretical framework and algorithmic paradigm for online LQR learning.

Technology Category

Application Category

📝 Abstract
We propose a novel Thompson sampling algorithm that learns linear quadratic regulators (LQR) with a Bayesian regret bound of $O(sqrt{T})$. Our method leverages Langevin dynamics with a carefully designed preconditioner and incorporates a simple excitation mechanism. We show that the excitation signal drives the minimum eigenvalue of the preconditioner to grow over time, thereby accelerating the approximate posterior sampling process. Furthermore, we establish nontrivial concentration properties of the approximate posteriors generated by our algorithm. These properties enable us to bound the moments of the system state and attain an $O(sqrt{T})$ regret bound without relying on the restrictive assumptions that are often used in the literature.
Problem

Research questions and friction points this paper is trying to address.

Develop Thompson sampling for linear quadratic regulators
Achieve O(sqrt(T)) regret with Langevin dynamics
Ensure posterior concentration without restrictive assumptions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel Thompson sampling for LQR
Langevin dynamics with preconditioner
Excitation mechanism accelerates sampling
🔎 Similar Papers
No similar papers found.
Yeoneung Kim
Yeoneung Kim
SeoulTech
mathematicsmachine learning
G
Gihun Kim
Department of Electrical and Computer Engineering and ASRI, Seoul National University, Seoul, 08826, South Korea
J
Jiwhan Park
Insoon Yang
Insoon Yang
Seoul National University, Electrical and Computer Engineering
stochastic controloptimizationreinforcement learning