Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update

📅 2025-03-01

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This paper studies the stochastic linear multi-armed bandit problem under heavy-tailed noise. Existing approaches—such as truncation and median-of-means methods—rely on strong noise assumptions (e.g., bounded moments) or restrictive structural constraints (e.g., sparsity), while recent adaptive Huber regression achieves broader applicability but incurs high computational cost due to full historical data storage and per-round full-data sweeps. To address these limitations, we propose the first adaptive Huber algorithm embedded within an online mirror descent framework, enabling single-pass, memory-efficient updates: no historical data storage is required, and per-round computation is merely $widetilde{mathcal{O}}(1)$. Under minimal noise assumptions—only requiring finite $(1+epsilon)$-th moments—we establish a variance-aware, near-optimal regret bound of $widetilde{mathcal{O}}ig(d T^{frac{1-epsilon}{2(1+epsilon)}} sqrt{sum u_t^2} + d T^{frac{1-epsilon}{2(1+epsilon)}}ig)$, eliminating structural dependencies inherent in prior methods.

Technology Category

Application Category

📝 Abstract

We study the stochastic linear bandits with heavy-tailed noise. Two principled strategies for handling heavy-tailed noise, truncation and median-of-means, have been introduced to heavy-tailed bandits. Nonetheless, these methods rely on specific noise assumptions or bandit structures, limiting their applicability to general settings. The recent work [Huang et al.2024] develops a soft truncation method via the adaptive Huber regression to address these limitations. However, their method suffers undesired computational cost: it requires storing all historical data and performing a full pass over these data at each round. In this paper, we propose a emph{one-pass} algorithm based on the online mirror descent framework. Our method updates using only current data at each round, reducing the per-round computational cost from $widetilde{mathcal{O}}(t log T)$ to $widetilde{mathcal{O}}(1)$ with respect to current round $t$ and the time horizon $T$, and achieves a near-optimal and variance-aware regret of order $widetilde{mathcal{O}}ig(d T^{frac{1-epsilon}{2(1+epsilon)}} sqrt{sum_{t=1}^T u_t^2} + d T^{frac{1-epsilon}{2(1+epsilon)}}ig)$ where $d$ is the dimension and $ u_t^{1+epsilon}$ is the $(1+epsilon)$-th central moment of reward at round $t$.

Problem

Research questions and friction points this paper is trying to address.

Address heavy-tailed noise in linear bandits

Reduce computational cost in bandit algorithms

Achieve near-optimal regret with one-pass updates

Innovation

Methods, ideas, or system contributions that make the work stand out.

One-pass algorithm for online mirror descent

Reduces computational cost to O(1) per round

Achieves near-optimal, variance-aware regret bounds

🔎 Similar Papers

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits