Parameter-Free Dynamic Regret for Unconstrained Linear Bandits

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the long-standing open problem of minimizing dynamic regret against arbitrary sequences of dynamic comparators in unconstrained adversarial linear bandits when only pointwise loss feedback is available. The paper proposes an adaptive algorithmic framework that does not require prior knowledge of the number of comparator switches, achieving—for the first time in linear bandits—an optimal dynamic regret bound with respect to any number of switches $S_T$. Building upon adaptive ensembling techniques from multi-armed bandits and incorporating a parameter-free design alongside refined analysis of dynamic comparators, the method attains a dynamic regret upper bound of $\mathcal{O}(\sqrt{d(1+S_T)T})$, up to logarithmic factors. This result significantly advances the theory of online learning in non-stationary environments.
📝 Abstract
We study dynamic regret minimization in unconstrained adversarial linear bandit problems. In this setting, a learner must minimize the cumulative loss relative to an arbitrary sequence of comparators $\boldsymbol{u}_1,\ldots,\boldsymbol{u}_T$ in $\mathbb{R}^d$, but receives only point-evaluation feedback on each round. We provide a simple approach to combining the guarantees of several bandit algorithms, allowing us to optimally adapt to the number of switches $S_T = \sum_t\mathbb{I}\{\boldsymbol{u}_t \neq \boldsymbol{u}_{t-1}\}$ of an arbitrary comparator sequence. In particular, we provide the first algorithm for linear bandits achieving the optimal regret guarantee of order $\mathcal{O}\big(\sqrt{d(1+S_T) T}\big)$ up to poly-logarithmic terms without prior knowledge of $S_T$, thus resolving a long-standing open problem.
Problem

Research questions and friction points this paper is trying to address.

dynamic regret
linear bandits
unconstrained
adversarial
parameter-free
Innovation

Methods, ideas, or system contributions that make the work stand out.

parameter-free
dynamic regret
linear bandits
adversarial setting
optimal adaptation
🔎 Similar Papers
No similar papers found.