A Jointly Efficient and Optimal Algorithm for Heteroskedastic Generalized Linear Bandits with Adversarial Corruptions

📅 2026-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of heteroscedastic generalized linear bandits under adversarial corruption, encompassing models such as linear, logistic, and Poisson regression. The authors propose the HCW-GLB-OMD algorithm, which integrates online mirror descent with a Hessian-driven confidence-weighting mechanism to achieve robust learning under the self-concordant link function assumption. Within a unified framework, this method attains the first instance-dependent minimax-optimal regret bound, with an upper bound of $\widetilde{O}\left(d\sqrt{\sum_t g(\tau_t)\dot{\mu}_{t,\star}} + d^2 g_{\max} \kappa + d \kappa C\right)$. This matches the proven lower bound of $\widetilde{\Omega}\left(d\sqrt{\sum_t g(\tau_t)\dot{\mu}_{t,\star}} + dC\right)$ up to a factor of $\kappa$, while maintaining constant per-round time and space complexity, thus offering both theoretical optimality and computational efficiency.

Technology Category

Application Category

📝 Abstract
We consider the problem of heteroskedastic generalized linear bandits (GLBs) with adversarial corruptions, which subsumes various stochastic contextual bandit settings, including heteroskedastic linear bandits and logistic/Poisson bandits. We propose HCW-GLB-OMD, which consists of two components: an online mirror descent (OMD)-based estimator and Hessian-based confidence weights to achieve corruption robustness. This is computationally efficient in that it only requires ${O}(1)$ space and time complexity per iteration. Under the self-concordance assumption on the link function, we show a regret bound of $\tilde{{O}}\left( d \sqrt{\sum_t g(\tau_t) \dot{\mu}_{t,\star}} + d^2 g_{\max} \kappa + d \kappa C \right)$, where $\dot{\mu}_{t,\star}$ is the slope of $\mu$ around the optimal arm at time $t$, $g(\tau_t)$'s are potentially exogenously time-varying dispersions (e.g., $g(\tau_t) = \sigma_t^2$ for heteroskedastic linear bandits, $g(\tau_t) = 1$ for Bernoulli and Poisson), $g_{\max} = \max_{t \in [T]} g(\tau_t)$ is the maximum dispersion, and $C \geq 0$ is the total corruption budget of the adversary. We complement this with a lower bound of $\tilde{\Omega}(d \sqrt{\sum_t g(\tau_t) \dot{\mu}_{t,\star}} + d C)$, unifying previous problem-specific lower bounds. Thus, our algorithm achieves, up to a $\kappa$-factor in the corruption term, instance-wise minimax optimality simultaneously across various instances of heteroskedastic GLBs with adversarial corruptions.
Problem

Research questions and friction points this paper is trying to address.

heteroskedastic generalized linear bandits
adversarial corruptions
stochastic contextual bandits
instance-wise minimax optimality
Innovation

Methods, ideas, or system contributions that make the work stand out.

heteroskedastic generalized linear bandits
adversarial corruptions
online mirror descent
instance-wise minimax optimality
Hessian-based confidence weights
🔎 Similar Papers
No similar papers found.
S
Sanghwa Kim
KAIST, Seoul, Republic of Korea
Junghyun Lee
Junghyun Lee
PhD Student @ KAIST AI
BanditsReinforcement LearningStatisticsLearning TheoryDeep Learning Theory
S
Se-Young Yun
KAIST, Seoul, Republic of Korea