Efficient Simple Regret Algorithms for Stochastic Contextual Bandits

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of theoretical guarantees for stochastic contextual logistic bandits under the simple regret objective by proposing two efficient algorithms: a deterministic approach and a novel variant of Thompson sampling. By integrating the linear contextual bandit framework with self-concordant analysis, the paper establishes the first simple regret bounds for this model that do not depend on the exponential constant κ. Theoretical results show that the deterministic algorithm achieves a simple regret of Õ(d/√T), while the randomized algorithm attains Õ(d^{3/2}/√T), both free from κ dependence. Empirical evaluations confirm the effectiveness and theoretical advantages of the proposed methods.

Technology Category

Application Category

📝 Abstract
We study stochastic contextual logistic bandits under the simple regret objective. While simple regret guarantees have been established for the linear case, no such results were previously known for the logistic setting. Building on ideas from contextual linear bandits and self-concordant analysis, we propose the first algorithm that achieves simple regret $\tilde{\mathcal{O}}(d/\sqrt{T})$. Notably, the leading term of our regret bound is free of the constant $\kappa = \mathcal O(\exp(S))$, where $S$ is a bound on the magnitude of the unknown parameter vector. The algorithm is shown to be fully tractable when the action set is finite. We also introduce a new variant of Thompson Sampling tailored to the simple-regret setting. This yields the first simple regret guarantee for randomized algorithms in stochastic contextual linear bandits, with regret $\tilde{\mathcal{O}}(d^{3/2}/\sqrt{T})$. Extending this method to the logistic case, we obtain a similarly structured Thompson Sampling algorithm that achieves the same regret bound -- $\tilde{\mathcal{O}}(d^{3/2}/\sqrt{T})$ -- again with no dependence on $\kappa$ in the leading term. The randomized algorithms, as expected, are cheaper to run than their deterministic counterparts. Finally, we conducted a series of experiments to empirically validate these theoretical guarantees.
Problem

Research questions and friction points this paper is trying to address.

stochastic contextual bandits
logistic bandits
simple regret
Thompson Sampling
regret bound
Innovation

Methods, ideas, or system contributions that make the work stand out.

simple regret
stochastic contextual bandits
logistic bandits
Thompson Sampling
self-concordant analysis
🔎 Similar Papers
No similar papers found.
S
Shuai Liu
University of Alberta
A
Alireza Bakhtiari
University of Washington, Seattle
Alex Ayoub
Alex Ayoub
Department of Computing Science, University of Alberta
Reinforcement Learning
B
Botao Hao
OpenAI
C
Csaba Szepesvári
University of Alberta