The Confusing Instance Principle for Online Linear Quadratic Control

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This paper addresses the online linear quadratic regulator (LQR) control problem under dynamically unknown system dynamics. To overcome the poor generalization and high computational complexity of conventional model-based reinforcement learning approaches—such as those relying on optimism or Thompson sampling—in continuous control settings, we introduce, for the first time, the Confusion Instance (CI) principle and the Minimum Empirical Divergence (MED) framework into LQR control. Integrating system identification, policy structure analysis, controller sensitivity, and closed-loop stability theory, we propose the MED-LQ algorithm. We establish a sublinear regret bound for MED-LQ, proving its theoretical efficacy. Empirical evaluation demonstrates that MED-LQ matches or exceeds state-of-the-art methods across multiple benchmark tasks, while exhibiting strong scalability potential toward large-scale continuous control problems.

Technology Category

Application Category

📝 Abstract

We revisit the problem of controlling linear systems with quadratic cost under unknown dynamics with model-based reinforcement learning. Traditional methods like Optimism in the Face of Uncertainty and Thompson Sampling, rooted in multi-armed bandits (MABs), face practical limitations. In contrast, we propose an alternative based on the Confusing Instance (CI) principle, which underpins regret lower bounds in MABs and discrete Markov Decision Processes (MDPs) and is central to the Minimum Empirical Divergence (MED) family of algorithms, known for their asymptotic optimality in various settings. By leveraging the structure of LQR policies along with sensitivity and stability analysis, we develop MED-LQ. This novel control strategy extends the principles of CI and MED beyond small-scale settings. Our benchmarks on a comprehensive control suite demonstrate that MED-LQ achieves competitive performance in various scenarios while highlighting its potential for broader applications in large-scale MDPs.

Problem

Research questions and friction points this paper is trying to address.

Controlling linear systems with quadratic cost under unknown dynamics

Overcoming limitations of traditional optimism-based reinforcement learning methods

Extending Confusing Instance principle to large-scale control problems

Innovation

Methods, ideas, or system contributions that make the work stand out.

MED-LQ extends Confusing Instance principle

Leverages LQR structure with stability analysis

Achieves competitive performance in control benchmarks

🔎 Similar Papers

No similar papers found.