Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards

📅 2025-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies stochastic linear bandits with rewards possessing finite (1+ε)-th central moments (ε ∈ (0,1]), i.e., heavy-tailed reward distributions. To address the relaxation of minimax regret bounds under heavy tails, we propose a novel elimination algorithm integrating optimal experimental design, geometry-aware analysis, kernelization, and robust estimation via truncation and median aggregation. We establish the first tight upper bound O(d^{(1+3ε)/(2(1+ε))} T^{1/(1+ε)}) and a matching lower bound Ω(d^{2ε/(1+ε)} T^{1/(1+ε)}). For Matérn kernels, we derive the first sublinear regret bound in infinite dimensions valid for all ε ∈ (0,1]. When ε = 1, our rate recovers the optimal bound for variance-bounded rewards. Moreover, for structured action sets—such as ℓₚ-balls with p ≤ 1+ε—the regret scales with intrinsic dimension, enabling further dimensionality reduction. Our results significantly improve both the dimension dependence and ε-robustness over prior work.

Technology Category

Application Category

📝 Abstract
We study stochastic linear bandits with heavy-tailed rewards, where the rewards have a finite $(1+epsilon)$-absolute central moment bounded by $upsilon$ for some $epsilon in (0,1]$. We improve both upper and lower bounds on the minimax regret compared to prior work. When $upsilon = mathcal{O}(1)$, the best prior known regret upper bound is $ ilde{mathcal{O}}(d T^{frac{1}{1+epsilon}})$. While a lower with the same scaling has been given, it relies on a construction using $upsilon = mathcal{O}(d)$, and adapting the construction to the bounded-moment regime with $upsilon = mathcal{O}(1)$ yields only a $Omega(d^{frac{epsilon}{1+epsilon}} T^{frac{1}{1+epsilon}})$ lower bound. This matches the known rate for multi-armed bandits and is generally loose for linear bandits, in particular being $sqrt{d}$ below the optimal rate in the finite-variance case ($epsilon = 1$). We propose a new elimination-based algorithm guided by experimental design, which achieves regret $ ilde{mathcal{O}}(d^{frac{1+3epsilon}{2(1+epsilon)}} T^{frac{1}{1+epsilon}})$, thus improving the dependence on $d$ for all $epsilon in (0,1)$ and recovering a known optimal result for $epsilon = 1$. We also establish a lower bound of $Omega(d^{frac{2epsilon}{1+epsilon}} T^{frac{1}{1+epsilon}})$, which strictly improves upon the multi-armed bandit rate and highlights the hardness of heavy-tailed linear bandit problems. For finite action sets, we derive similarly improved upper and lower bounds for regret. Finally, we provide action set dependent regret upper bounds showing that for some geometries, such as $l_p$-norm balls for $p le 1 + epsilon$, we can further reduce the dependence on $d$, and we can handle infinite-dimensional settings via the kernel trick, in particular establishing new regret bounds for the Mat'ern kernel that are the first to be sublinear for all $epsilon in (0, 1]$.
Problem

Research questions and friction points this paper is trying to address.

Improves regret bounds for heavy-tailed linear bandits.
Addresses suboptimal dependence on dimension in prior work.
Establishes tighter lower bounds for heavy-tailed rewards.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Elimination-based algorithm with experimental design
Improved regret bounds for heavy-tailed rewards
Handles infinite-dimensional settings via kernel trick
🔎 Similar Papers
No similar papers found.