Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator

📅 2024-06-08
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the risk-sensitive linear quadratic regulator (RS-LQR) control problem in a finite-horizon, turn-based online adaptive setting. Addressing the challenge of unknown system dynamics requiring online learning, we propose two algorithms: a greedy controller based on least-squares estimation and its exploration-enhanced variant with injected excitation noise. We establish the first theoretical regret bounds for RS-LQR, rigorously distinguishing identifiable and non-identifiable regimes: under identifiability, we achieve a logarithmic regret upper bound of $ ilde{O}(log N)$; without assumptions, we attain a sublinear $ ilde{O}(sqrt{N})$ bound. Key technical contributions include perturbation analysis of the risk-sensitive Riccati equation, precise characterization of controller performance loss, and principled exploration–exploitation trade-off design. To our knowledge, this is the first work on online adaptive RS-LQR control with provable regret guarantees.

Technology Category

Application Category

📝 Abstract
Risk-sensitive linear quadratic regulator is one of the most fundamental problems in risk-sensitive optimal control. In this paper, we study online adaptive control of risk-sensitive linear quadratic regulator in the finite horizon episodic setting. We propose a simple least-squares greedy algorithm and show that it achieves $widetilde{mathcal{O}}(log N)$ regret under a specific identifiability assumption, where $N$ is the total number of episodes. If the identifiability assumption is not satisfied, we propose incorporating exploration noise into the least-squares-based algorithm, resulting in an algorithm with $widetilde{mathcal{O}}(sqrt{N})$ regret. To our best knowledge, this is the first set of regret bounds for episodic risk-sensitive linear quadratic regulator. Our proof relies on perturbation analysis of less-standard Riccati equations for risk-sensitive linear quadratic control, and a delicate analysis of the loss in the risk-sensitive performance criterion due to applying the suboptimal controller in the online learning process.
Problem

Research questions and friction points this paper is trying to address.

Regret bounds for episodic risk-sensitive control
Online adaptive control in finite horizon
Least-squares greedy algorithm performance analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Least-squares greedy algorithm
Exploration noise incorporation
Perturbation analysis Riccati equations
🔎 Similar Papers
No similar papers found.
Wenhao Xu
Wenhao Xu
Unknown affiliation
X
Xuefeng Gao
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China
X
Xuedong He
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China