A Jointly Efficient and Optimal Algorithm for Heteroskedastic Generalized Linear Bandits with Adversarial Corruptions

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the problem of heteroscedastic generalized linear bandits under adversarial corruption, encompassing models such as linear, logistic, and Poisson regression. The authors propose the HCW-GLB-OMD algorithm, which integrates online mirror descent with a Hessian-driven confidence-weighting mechanism to achieve robust learning under the self-concordant link function assumption. Within a unified framework, this method attains the first instance-dependent minimax-optimal regret bound, with an upper bound of $\widetilde{O}\left(d\sqrt{\sum_t g(\tau_t)\dot{\mu}_{t,\star}} + d^2 g_{\max} \kappa + d \kappa C\right)$. This matches the proven lower bound of $\widetilde{\Omega}\left(d\sqrt{\sum_t g(\tau_t)\dot{\mu}_{t,\star}} + dC\right)$ up to a factor of $\kappa$, while maintaining constant per-round time and space complexity, thus offering both theoretical optimality and computational efficiency.

Technology Category

Application Category

📝 Abstract

We consider the problem of heteroskedastic generalized linear bandits (GLBs) with adversarial corruptions, which subsumes various stochastic contextual bandit settings, including heteroskedastic linear bandits and logistic/Poisson bandits. We propose HCW-GLB-OMD, which consists of two components: an online mirror descent (OMD)-based estimator and Hessian-based confidence weights to achieve corruption robustness. This is computationally efficient in that it only requires ${O}(1)$ space and time complexity per iteration. Under the self-concordance assumption on the link function, we show a regret bound of $\tilde{{O}}\left( d \sqrt{\sum_t g(\tau_t) \dot{\mu}_{t,\star}} + d^2 g_{\max} \kappa + d \kappa C \right)$, where $\dot{\mu}_{t,\star}$ is the slope of $\mu$ around the optimal arm at time $t$, $g(\tau_t)$'s are potentially exogenously time-varying dispersions (e.g., $g(\tau_t) = \sigma_t^2$ for heteroskedastic linear bandits, $g(\tau_t) = 1$ for Bernoulli and Poisson), $g_{\max} = \max_{t \in [T]} g(\tau_t)$ is the maximum dispersion, and $C \geq 0$ is the total corruption budget of the adversary. We complement this with a lower bound of $\tilde{\Omega}(d \sqrt{\sum_t g(\tau_t) \dot{\mu}_{t,\star}} + d C)$, unifying previous problem-specific lower bounds. Thus, our algorithm achieves, up to a $\kappa$-factor in the corruption term, instance-wise minimax optimality simultaneously across various instances of heteroskedastic GLBs with adversarial corruptions.

Problem

Research questions and friction points this paper is trying to address.

heteroskedastic generalized linear bandits

adversarial corruptions

stochastic contextual bandits

instance-wise minimax optimality

Innovation

Methods, ideas, or system contributions that make the work stand out.

heteroskedastic generalized linear bandits

adversarial corruptions

online mirror descent

instance-wise minimax optimality