Instance-dependent Stochastic Lipschitz bandit

๐Ÿ“… 2026-05-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitation of existing regret bounds for Lipschitz bandits, which rely solely on the asymptotic growth of optimal level sets and fail to capture the fine structure of the objective function. To overcome this, the paper introduces a novel analysis framework based on integrating suboptimality gaps over level sets. Through an adaptive refinement strategy, it establishes the first fully instance-dependent regret bound that directly links regret performance to the local geometric structure of level sets. When the dimension of the maximizer set satisfies \(d^* > 0\), the resulting adaptive regret bound is \(\widetilde{O}(T^{(d_z+1)/(\max(d_z,d^*)+2)})\), improving upon the classical zooming bound and extending naturally to the full-information Lipschitz experts setting.
๐Ÿ“ Abstract
We study the Lipschitz bandit problem, where a learner sequentially maximizes an unknown Lipschitz function $f$ over a domain $\mathcal{X} \subset [0,1]^d$ using noisy pointwise evaluations. Existing regret bounds are either worst-case, scaling as $\tildeฮ˜ \left ( T^{d+1/d+2}\right )$, or adaptive via the zooming dimension $d_z$, yielding $\tildeฮ˜ \left ( T^{d_z+1/d_z+2}\right )$. However, such zooming-based guarantees are only partially instance-dependent, as they depend solely on the asymptotic growth of near-optimal level sets and fail to capture finer structural properties of $f$. We provide an analysis and an algorithm that characterizes the regret through integrals of the suboptimality gap of $f$ over its level sets. This yields regret bounds that adapt to the local growth of level sets, rather than only their asymptotic behavior. As a corollary, when the set of maximizers has dimension $d^\star>0$, we obtain improved adaptive rates of order $\tilde{\mathcal{O}} \left ( T^{d_z+1 / \max(d_z,d^\star)+2}\right )$ strictly improving over classical zooming bounds in this regime. Finally, we extend our analysis to the full-information setting (Lipschitz experts) and show how some of the regularity assumptions can be relaxed.
Problem

Research questions and friction points this paper is trying to address.

Lipschitz bandit
instance-dependent regret
zooming dimension
level sets
suboptimality gap
Innovation

Methods, ideas, or system contributions that make the work stand out.

instance-dependent regret
Lipschitz bandits
level set integration
adaptive rates
zooming dimension