The Confusing Instance Principle for Online Linear Quadratic Control

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the online linear quadratic regulator (LQR) control problem under dynamically unknown system dynamics. To overcome the poor generalization and high computational complexity of conventional model-based reinforcement learning approaches—such as those relying on optimism or Thompson sampling—in continuous control settings, we introduce, for the first time, the Confusion Instance (CI) principle and the Minimum Empirical Divergence (MED) framework into LQR control. Integrating system identification, policy structure analysis, controller sensitivity, and closed-loop stability theory, we propose the MED-LQ algorithm. We establish a sublinear regret bound for MED-LQ, proving its theoretical efficacy. Empirical evaluation demonstrates that MED-LQ matches or exceeds state-of-the-art methods across multiple benchmark tasks, while exhibiting strong scalability potential toward large-scale continuous control problems.

Technology Category

Application Category

📝 Abstract
We revisit the problem of controlling linear systems with quadratic cost under unknown dynamics with model-based reinforcement learning. Traditional methods like Optimism in the Face of Uncertainty and Thompson Sampling, rooted in multi-armed bandits (MABs), face practical limitations. In contrast, we propose an alternative based on the Confusing Instance (CI) principle, which underpins regret lower bounds in MABs and discrete Markov Decision Processes (MDPs) and is central to the Minimum Empirical Divergence (MED) family of algorithms, known for their asymptotic optimality in various settings. By leveraging the structure of LQR policies along with sensitivity and stability analysis, we develop MED-LQ. This novel control strategy extends the principles of CI and MED beyond small-scale settings. Our benchmarks on a comprehensive control suite demonstrate that MED-LQ achieves competitive performance in various scenarios while highlighting its potential for broader applications in large-scale MDPs.
Problem

Research questions and friction points this paper is trying to address.

Controlling linear systems with quadratic cost under unknown dynamics
Overcoming limitations of traditional optimism-based reinforcement learning methods
Extending Confusing Instance principle to large-scale control problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

MED-LQ extends Confusing Instance principle
Leverages LQR structure with stability analysis
Achieves competitive performance in control benchmarks
🔎 Similar Papers
No similar papers found.