Flickering Multi-Armed Bandits

📅 2026-02-19

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This work addresses the dynamic multi-armed bandit setting where the action set evolves over time and is constrained by the agent’s previous selections. For the first time, action availability is modeled as a stochastic graph process, in which the agent can only select actions within a local neighborhood of its current position. To tackle this challenge, a two-stage algorithm is proposed: it first explores for the optimal arm via a lazy random walk and then navigates to and exploits that arm. Information-theoretic lower bounds are established under both Erdős–Rényi and Edge-Markovian graph models, and the algorithm achieves sublinear regret bounds with high probability and in expectation under both models. Theoretical analysis and simulations—including a robotic reconnaissance scenario in disaster zones—demonstrate the near-optimality of the approach and reveal the inherent cost of exploration under local mobility constraints.

Technology Category

Application Category

📝 Abstract

We introduce Flickering Multi-Armed Bandits (FMAB), a new MAB framework where the set of available arms (or actions) can change at each round, and the available set at any time may depend on the agent's previously selected arm. We model this constrained, evolving availability using random graph processes, where arms are nodes and the agent's movement is restricted to its local neighborhood. We analyze this problem under two random graph models: an i.i.d. Erdős--Rényi (ER) process and an Edge-Markovian process. We propose and analyze a two-phase algorithm that employs a lazy random walk for exploration to efficiently identify the optimal arm, followed by a navigation and commitment phase for exploitation. We establish high-probability and expected sublinear regret bounds for both graph settings. We show that the exploration cost of our algorithm is near-optimal by establishing a matching information-theoretic lower bound for this problem class, highlighting the fundamental cost of exploration under local-move constraints. We complement our theoretical guarantees with numerical simulations, including a scenario of a robotic ground vehicle scouting a disaster-affected region.

Problem

Research questions and friction points this paper is trying to address.

Multi-Armed Bandits

Dynamic Availability

Local-Move Constraints

Random Graph Processes

Regret Minimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Flickering Multi-Armed Bandits

random graph processes

lazy random walk