Eluder dimension: localise it!

πŸ“… 2026-01-14
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Standard analyses based on eluder dimension struggle to yield first-order regret bounds, limiting their applicability in reinforcement learning. This work addresses this limitation by introducing, for the first time, a localized eluder dimension technique tailored to generalized linear model classes. By integrating this localized analysis with the inherent statistical structure of generalized linear models, the paper establishes first-order regret bounds for finite-horizon reinforcement learning with bounded cumulative rewards. Furthermore, the approach not only recovers but also improves upon classical results for Bernoulli bandits, thereby significantly broadening the theoretical scope and practical relevance of eluder dimension-based methods.

Technology Category

Application Category

πŸ“ Abstract
We establish a lower bound on the eluder dimension of generalised linear model classes, showing that standard eluder dimension-based analysis cannot lead to first-order regret bounds. To address this, we introduce a localisation method for the eluder dimension; our analysis immediately recovers and improves on classic results for Bernoulli bandits, and allows for the first genuine first-order bounds for finite-horizon reinforcement learning tasks with bounded cumulative returns.
Problem

Research questions and friction points this paper is trying to address.

eluder dimension
first-order regret bounds
generalised linear models
reinforcement learning
cumulative returns
Innovation

Methods, ideas, or system contributions that make the work stand out.

eluder dimension
localisation
first-order regret
generalised linear models
reinforcement learning
πŸ”Ž Similar Papers
No similar papers found.
A
Alireza Bakhtiari
University of Alberta
Alex Ayoub
Alex Ayoub
Department of Computing Science, University of Alberta
Reinforcement Learning
S
Samuel Robertson
University of Alberta
David Janz
David Janz
University of Oxford
statisticsmachine learningreinforcement learning
C
Csaba SzepesvΓ‘ri
University of Alberta