Instance-dependent Stochastic Lipschitz bandit

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

This work addresses the limitation of existing regret bounds for Lipschitz bandits, which rely solely on the asymptotic growth of optimal level sets and fail to capture the fine structure of the objective function. To overcome this, the paper introduces a novel analysis framework based on integrating suboptimality gaps over level sets. Through an adaptive refinement strategy, it establishes the first fully instance-dependent regret bound that directly links regret performance to the local geometric structure of level sets. When the dimension of the maximizer set satisfies $d^* > 0$, the resulting adaptive regret bound is $\widetilde{O}(T^{(d_z+1)/(\max(d_z,d^*)+2)})$, improving upon the classical zooming bound and extending naturally to the full-information Lipschitz experts setting.

📝 Abstract

We study the Lipschitz bandit problem, where a learner sequentially maximizes an unknown Lipschitz function $f$ over a domain $\mathcal{X} \subset [0,1]^d$ using noisy pointwise evaluations. Existing regret bounds are either worst-case, scaling as $\tildeΘ \left ( T^{d+1/d+2}\right )$, or adaptive via the zooming dimension $d_z$, yielding $\tildeΘ \left ( T^{d_z+1/d_z+2}\right )$. However, such zooming-based guarantees are only partially instance-dependent, as they depend solely on the asymptotic growth of near-optimal level sets and fail to capture finer structural properties of $f$. We provide an analysis and an algorithm that characterizes the regret through integrals of the suboptimality gap of $f$ over its level sets. This yields regret bounds that adapt to the local growth of level sets, rather than only their asymptotic behavior. As a corollary, when the set of maximizers has dimension $d^\star>0$, we obtain improved adaptive rates of order $\tilde{\mathcal{O}} \left ( T^{d_z+1 / \max(d_z,d^\star)+2}\right )$ strictly improving over classical zooming bounds in this regime. Finally, we extend our analysis to the full-information setting (Lipschitz experts) and show how some of the regularity assumptions can be relaxed.

Problem

Research questions and friction points this paper is trying to address.

Lipschitz bandit

instance-dependent regret

zooming dimension

level sets

suboptimality gap

Innovation

Methods, ideas, or system contributions that make the work stand out.

instance-dependent regret

Lipschitz bandits

level set integration