Logarithmic Regret for Nonlinear Control

📅 2025-01-17

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This paper addresses online learning control of unknown nonlinear dynamical systems in high-stakes applications such as robotics and healthcare, aiming to minimize both instantaneous tracking error and cumulative regret. We propose the first online adaptive control framework applicable to systems with *parameter-dependent nonlinearities*. Under standard assumptions of dynamics smoothness and persistent excitation, we establish the first logarithmic regret bound $O(log T)$; when excitation is insufficient, the bound degrades gracefully to the optimal $O(sqrt{T})$. Our method integrates persistent excitation analysis, online nonlinear parameter estimation, and adaptive optimization theory, rigorously distinguishing between excitable and non-excitable optimal policies. Extensive simulations validate the theoretical regret bounds and convergence rates, demonstrating significant improvements in learning efficiency and closed-loop stability for safety-critical control of unknown nonlinear systems.

Technology Category

Application Category

📝 Abstract

We address the problem of learning to control an unknown nonlinear dynamical system through sequential interactions. Motivated by high-stakes applications in which mistakes can be catastrophic, such as robotics and healthcare, we study situations where it is possible for fast sequential learning to occur. Fast sequential learning is characterized by the ability of the learning agent to incur logarithmic regret relative to a fully-informed baseline. We demonstrate that fast sequential learning is achievable in a diverse class of continuous control problems where the system dynamics depend smoothly on unknown parameters, provided the optimal control policy is persistently exciting. Additionally, we derive a regret bound which grows with the square root of the number of interactions for cases where the optimal policy is not persistently exciting. Our results provide the first regret bounds for controlling nonlinear dynamical systems depending nonlinearly on unknown parameters. We validate the trends our theory predicts in simulation on a simple dynamical system.

Problem

Research questions and friction points this paper is trying to address.

Stable Control Learning

Complex System Behavior

Error Probability Minimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Complex System Control

Error Probability Estimation

Continuous Control Learning

🔎 Similar Papers

Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator

2024-06-08arXiv.orgCitations: 0