Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator

📅 2024-06-08

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This paper studies the risk-sensitive linear quadratic regulator (RS-LQR) control problem in a finite-horizon, turn-based online adaptive setting. Addressing the challenge of unknown system dynamics requiring online learning, we propose two algorithms: a greedy controller based on least-squares estimation and its exploration-enhanced variant with injected excitation noise. We establish the first theoretical regret bounds for RS-LQR, rigorously distinguishing identifiable and non-identifiable regimes: under identifiability, we achieve a logarithmic regret upper bound of $ ilde{O}(log N)$; without assumptions, we attain a sublinear $ ilde{O}(sqrt{N})$ bound. Key technical contributions include perturbation analysis of the risk-sensitive Riccati equation, precise characterization of controller performance loss, and principled exploration–exploitation trade-off design. To our knowledge, this is the first work on online adaptive RS-LQR control with provable regret guarantees.

Technology Category

Application Category

📝 Abstract

Risk-sensitive linear quadratic regulator is one of the most fundamental problems in risk-sensitive optimal control. In this paper, we study online adaptive control of risk-sensitive linear quadratic regulator in the finite horizon episodic setting. We propose a simple least-squares greedy algorithm and show that it achieves $widetilde{mathcal{O}}(log N)$ regret under a specific identifiability assumption, where $N$ is the total number of episodes. If the identifiability assumption is not satisfied, we propose incorporating exploration noise into the least-squares-based algorithm, resulting in an algorithm with $widetilde{mathcal{O}}(sqrt{N})$ regret. To our best knowledge, this is the first set of regret bounds for episodic risk-sensitive linear quadratic regulator. Our proof relies on perturbation analysis of less-standard Riccati equations for risk-sensitive linear quadratic control, and a delicate analysis of the loss in the risk-sensitive performance criterion due to applying the suboptimal controller in the online learning process.

Problem

Research questions and friction points this paper is trying to address.

Regret bounds for episodic risk-sensitive control

Online adaptive control in finite horizon

Least-squares greedy algorithm performance analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Least-squares greedy algorithm

Exploration noise incorporation

Perturbation analysis Riccati equations

🔎 Similar Papers

Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(sqrt{T})$ Regret