Optimal High-Probability Regret for Online Convex Optimization with Two-Point Bandit Feedback

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses online convex optimization in adversarial environments under bandit feedback, where only the loss values at two queried points are observable. Focusing on μ-strongly convex loss functions, the paper proposes a novel algorithm based on two-point gradient estimation and introduces high-probability analysis techniques tailored to handle heavy-tailed noise, thereby overcoming the limitations of conventional concentration inequalities. The authors establish, for the first time, a high-probability regret bound of O(d(log T + log(1/δ))/μ), which is minimax optimal in both the time horizon T and the dimension d. This result resolves a long-standing open problem in the field and represents a significant advance in the theory of bandit online learning.

Technology Category

Application Category

📝 Abstract
We consider the problem of Online Convex Optimization (OCO) with two-point bandit feedback in an adversarial environment. In this setting, a player attempts to minimize a sequence of adversarially generated convex loss functions, while only observing the value of each function at two points. While it is well-known that two-point feedback allows for gradient estimation, achieving tight high-probability regret bounds for strongly convex functions still remained open as highlighted by \citet{agarwal2010optimal}. The primary challenge lies in the heavy-tailed nature of bandit gradient estimators, which makes standard concentration analysis difficult. In this paper, we resolve this open challenge by providing the first high-probability regret bound of $O(d(\log T + \log(1/δ))/μ)$ for $μ$-strongly convex losses. Our result is minimax optimal with respect to both the time horizon $T$ and the dimension $d$.
Problem

Research questions and friction points this paper is trying to address.

Online Convex Optimization
Two-Point Bandit Feedback
High-Probability Regret
Strongly Convex Functions
Adversarial Environment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online Convex Optimization
Two-point Bandit Feedback
High-probability Regret
Strongly Convex Functions
Minimax Optimality
🔎 Similar Papers
No similar papers found.