Online Learning for Supervisory Switching Control

📅 2026-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of online optimal controller switching in partially observed linear dynamical systems without prior stability assumptions. The authors propose a supervised switching control framework grounded in multi-armed bandit theory, incorporating an observability-based scoring mechanism that effectively decouples the influence of historical states and enables data-driven evaluation of candidate controllers. For the first time, the study establishes non-asymptotic, dimension-free finite-time performance guarantees for this setting, thereby circumventing the strong stability assumptions inherent in conventional approaches. Two algorithmic variants are introduced, both of which identify the optimal controller within $\mathcal{O}(N \log N)$ time steps while simultaneously ensuring a bounded finite $L_2$ gain with respect to disturbances.

Technology Category

Application Category

📝 Abstract
We study supervisory switching control for partially-observed linear dynamical systems. The objective is to identify and deploy the best controller for the unknown system by periodically selecting among a collection of $N$ candidate controllers, some of which may destabilize the underlying system. While classical estimator-based supervisory control guarantees asymptotic stability, it lacks quantitative finite-time performance bounds. Conversely, current non-asymptotic methods in both online learning and system identification require restrictive assumptions that are incompatible in a control setting, such as system stability, which preclude testing potentially unstable controllers. To bridge this gap, we propose a novel, non-asymptotic analysis of supervisory control that adapts multi-armed bandit algorithms to address these control-theoretic challenges. Our data-driven algorithm evaluates candidate controllers via scoring criteria that leverage system observability to isolate the effects of historical states, enabling both detection of destabilizing controllers and accurate system identification. We present two algorithmic variants with dimension-free, finite-time guarantees, where each identifies the most suitable controller in $\mathcal{O}(N \log N)$ steps, while simultaneously achieving finite $L_2$-gain with respect to system disturbances.
Problem

Research questions and friction points this paper is trying to address.

supervisory switching control
partially-observed linear dynamical systems
online learning
finite-time performance
controller selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

supervisory switching control
multi-armed bandits
non-asymptotic analysis
partially-observed linear systems
finite-time guarantees