Offline Local Search for Online Stochastic Bandits

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

career value

271K/year

🤖 AI Summary

This work addresses the challenge of effectively transforming offline local search algorithms into online stochastic combinatorial multi-armed bandit algorithms with low regret. To this end, the authors propose a general framework that systematically integrates local search methods into online stochastic combinatorial optimization for the first time, with broad applicability to problems such as scheduling, matroid basis selection, and clustering under uncertainty. Leveraging the approximation guarantees of offline local search, the framework models the problem as a combinatorial multi-armed bandit and employs refined regret analysis to achieve an approximation regret bound of $O(\log^3 T)$, which depends only polylogarithmically on the time horizon $T$. This result substantially improves upon existing bounds that scale polynomially with $T$, thereby overcoming a fundamental theoretical limitation in conventional offline-to-online conversion approaches.

Technology Category

Application Category

📝 Abstract

Combinatorial multi-armed bandits provide a fundamental online decision-making environment where a decision-maker interacts with an environment across $T$ time steps, each time selecting an action and learning the cost of that action. The goal is to minimize regret, defined as the loss compared to the optimal fixed action in hindsight under full-information. There has been substantial interest in leveraging what is known about offline algorithm design in this online setting. Offline greedy and linear optimization algorithms (both exact and approximate) have been shown to provide useful guarantees when deployed online. We investigate local search methods, a broad class of algorithms used widely in both theory and practice, which have thus far been under-explored in this context. We focus on problems where offline local search terminates in an approximately optimal solution and give a generic method for converting such an offline algorithm into an online stochastic combinatorial bandit algorithm with $O(\log^3 T)$ (approximate) regret. In contrast, existing offline-to-online frameworks yield regret (and approximate regret) which depend sub-linearly, but polynomially on $T$. We demonstrate the flexibility of our framework by applying it to three online stochastic combinatorial optimization problems: scheduling to minimize total completion time, finding a minimum cost base of a matroid and uncertain clustering.

Problem

Research questions and friction points this paper is trying to address.

online stochastic bandits

combinatorial optimization

local search

regret minimization

offline-to-online conversion

Innovation

Methods, ideas, or system contributions that make the work stand out.

local search

stochastic combinatorial bandits

logarithmic regret