Variance-Dependent Regret Lower Bounds for Contextual Bandits

📅 2025-03-15

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This paper addresses the long-standing open problem of variance-dependent regret lower bounds in linear contextual bandits, resolving two key limitations in Jia et al. (2024): the √d dimensional gap and restriction to fixed total variance budgets. The authors establish the first tight information-theoretic lower bounds for general (non-budgeted) noise variance sequences. Under prespecified and adversarially adaptive variance models, they derive Ω(d√∑σₖ²/log K) and Ω(d√∑σₖ²/log⁶(dK)) regret lower bounds, respectively. These match the state-of-the-art upper bounds of the SAVE algorithm up to logarithmic factors, thereby eliminating the dimensional gap entirely. Crucially, the work achieves the first fine-grained coupling between variance-sequence modeling and information-theoretic lower-bound analysis, fully characterizing the fundamental limit of variance-sensitive regret in linear contextual bandits.

Technology Category

Application Category

📝 Abstract

Variance-dependent regret bounds for linear contextual bandits, which improve upon the classical $ ilde{O}(dsqrt{K})$ regret bound to $ ilde{O}(dsqrt{sum_{k=1}^Ksigma_k^2})$, where $d$ is the context dimension, $K$ is the number of rounds, and $sigma^2_k$ is the noise variance in round $k$, has been widely studied in recent years. However, most existing works focus on the regret upper bounds instead of lower bounds. To our knowledge, the only lower bound is from Jia et al. (2024), which proved that for any eluder dimension $d_{ extbf{elu}}$ and total variance budget $Lambda$, there exists an instance with $sum_{k=1}^Ksigma_k^2leq Lambda$ for which any algorithm incurs a variance-dependent lower bound of $Omega(sqrt{d_{ extbf{elu}}Lambda})$. However, this lower bound has a $sqrt{d}$ gap with existing upper bounds. Moreover, it only considers a fixed total variance budget $Lambda$ and does not apply to a general variance sequence ${sigma_1^2,ldots,sigma_K^2}$. In this paper, to overcome the limitations of Jia et al. (2024), we consider the general variance sequence under two settings. For a prefixed sequence, where the entire variance sequence is revealed to the learner at the beginning of the learning process, we establish a variance-dependent lower bound of $Omega(d sqrt{sum_{k=1}^Ksigma_k^2 }/log K)$ for linear contextual bandits. For an adaptive sequence, where an adversary can generate the variance $sigma_k^2$ in each round $k$ based on historical observations, we show that when the adversary must generate $sigma_k^2$ before observing the decision set $mathcal{D}_k$, a similar lower bound of $Omega(dsqrt{ sum_{k=1}^Ksigma_k^2} /log^6(dK))$ holds. In both settings, our results match the upper bounds of the SAVE algorithm (Zhao et al., 2023) up to logarithmic factors.

Problem

Research questions and friction points this paper is trying to address.

Establishes variance-dependent regret lower bounds for contextual bandits

Addresses gap between existing upper and lower bounds in variance sequences

Considers both prefixed and adaptive variance sequence settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prefixed sequence variance-dependent lower bound

Adaptive sequence variance-dependent lower bound

Matching upper bounds up to logarithmic factors

🔎 Similar Papers

Batched Nonparametric Contextual Bandits