Non-Stationary Lipschitz Bandits

📅 2025-05-24

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This paper studies the non-stationary Lipschitz bandit problem with infinite action spaces: the reward function is Lipschitz continuous and evolves arbitrarily over time—without any prior knowledge of change points or structural assumptions on non-stationarity. To minimize dynamic regret, we propose the first adaptive detection mechanism based on *significant shifts*, integrated with hierarchical space discretization, sliding-window cumulative reward estimation, and Lipschitz function tracking. Crucially, our algorithm requires no prior information about the environment’s non-stationarity. We establish the first minimax-optimal dynamic regret bound of $ ilde{O}( ilde{L}^{1/3} T^{2/3})$, where $ ilde{L}$ denotes the number of significant shifts and $T$ is the horizon. This result constitutes the first online learning framework for this setting with provably optimal theoretical guarantees.

Technology Category

Application Category

📝 Abstract

We study the problem of non-stationary Lipschitz bandits, where the number of actions is infinite and the reward function, satisfying a Lipschitz assumption, can change arbitrarily over time. We design an algorithm that adaptively tracks the recently introduced notion of significant shifts, defined by large deviations of the cumulative reward function. To detect such reward changes, our algorithm leverages a hierarchical discretization of the action space. Without requiring any prior knowledge of the non-stationarity, our algorithm achieves a minimax-optimal dynamic regret bound of $mathcal{widetilde{O}}( ilde{L}^{1/3}T^{2/3})$, where $ ilde{L}$ is the number of significant shifts and $T$ the horizon. This result provides the first optimal guarantee in this setting.

Problem

Research questions and friction points this paper is trying to address.

Infinite actions with non-stationary Lipschitz rewards

Adaptive tracking of significant reward shifts

Minimax-optimal dynamic regret without prior knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptively tracks significant reward shifts

Uses hierarchical action space discretization

Achieves minimax-optimal dynamic regret

🔎 Similar Papers

No similar papers found.