Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update

📅 2025-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the stochastic linear multi-armed bandit problem under heavy-tailed noise. Existing approaches—such as truncation and median-of-means methods—rely on strong noise assumptions (e.g., bounded moments) or restrictive structural constraints (e.g., sparsity), while recent adaptive Huber regression achieves broader applicability but incurs high computational cost due to full historical data storage and per-round full-data sweeps. To address these limitations, we propose the first adaptive Huber algorithm embedded within an online mirror descent framework, enabling single-pass, memory-efficient updates: no historical data storage is required, and per-round computation is merely $widetilde{mathcal{O}}(1)$. Under minimal noise assumptions—only requiring finite $(1+epsilon)$-th moments—we establish a variance-aware, near-optimal regret bound of $widetilde{mathcal{O}}ig(d T^{frac{1-epsilon}{2(1+epsilon)}} sqrt{sum u_t^2} + d T^{frac{1-epsilon}{2(1+epsilon)}}ig)$, eliminating structural dependencies inherent in prior methods.

Technology Category

Application Category

📝 Abstract
We study the stochastic linear bandits with heavy-tailed noise. Two principled strategies for handling heavy-tailed noise, truncation and median-of-means, have been introduced to heavy-tailed bandits. Nonetheless, these methods rely on specific noise assumptions or bandit structures, limiting their applicability to general settings. The recent work [Huang et al.2024] develops a soft truncation method via the adaptive Huber regression to address these limitations. However, their method suffers undesired computational cost: it requires storing all historical data and performing a full pass over these data at each round. In this paper, we propose a emph{one-pass} algorithm based on the online mirror descent framework. Our method updates using only current data at each round, reducing the per-round computational cost from $widetilde{mathcal{O}}(t log T)$ to $widetilde{mathcal{O}}(1)$ with respect to current round $t$ and the time horizon $T$, and achieves a near-optimal and variance-aware regret of order $widetilde{mathcal{O}}ig(d T^{frac{1-epsilon}{2(1+epsilon)}} sqrt{sum_{t=1}^T u_t^2} + d T^{frac{1-epsilon}{2(1+epsilon)}}ig)$ where $d$ is the dimension and $ u_t^{1+epsilon}$ is the $(1+epsilon)$-th central moment of reward at round $t$.
Problem

Research questions and friction points this paper is trying to address.

Address heavy-tailed noise in linear bandits
Reduce computational cost in bandit algorithms
Achieve near-optimal regret with one-pass updates
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-pass algorithm for online mirror descent
Reduces computational cost to O(1) per round
Achieves near-optimal, variance-aware regret bounds
🔎 Similar Papers
2024-10-02International Conference on Machine LearningCitations: 1
2024-07-24arXiv.orgCitations: 4
J
Jing Wang
National Key Laboratory for Novel Software Technology, Nanjing University, China; School of Artificial Intelligence, Nanjing University, China
Yu-Jie Zhang
Yu-Jie Zhang
RIKEN AIP
Machine LearningOnline LearningWeakly Supervised Learning
P
Peng Zhao
National Key Laboratory for Novel Software Technology, Nanjing University, China; School of Artificial Intelligence, Nanjing University, China
Zhi-Hua Zhou
Zhi-Hua Zhou
Nanjing University
Artificial IntelligenceMachine LearningData Mining