π€ AI Summary
Standard analyses based on eluder dimension struggle to yield first-order regret bounds, limiting their applicability in reinforcement learning. This work addresses this limitation by introducing, for the first time, a localized eluder dimension technique tailored to generalized linear model classes. By integrating this localized analysis with the inherent statistical structure of generalized linear models, the paper establishes first-order regret bounds for finite-horizon reinforcement learning with bounded cumulative rewards. Furthermore, the approach not only recovers but also improves upon classical results for Bernoulli bandits, thereby significantly broadening the theoretical scope and practical relevance of eluder dimension-based methods.
π Abstract
We establish a lower bound on the eluder dimension of generalised linear model classes, showing that standard eluder dimension-based analysis cannot lead to first-order regret bounds. To address this, we introduce a localisation method for the eluder dimension; our analysis immediately recovers and improves on classic results for Bernoulli bandits, and allows for the first genuine first-order bounds for finite-horizon reinforcement learning tasks with bounded cumulative returns.