Infrequent Exploration in Linear Bandits

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional exploration strategies (e.g., UCB, Thompson sampling) for linear bandits require frequent exploration—rendering them impractical or ethically unacceptable in safety-critical applications. Method: This paper proposes INFEX, a framework that triggers adaptive exploration (e.g., UCB or Thompson sampling) only at sparse, strategically chosen time points, while deploying greedy decisions otherwise. INFEX features a modular design supporting lazy parameter updates and computational latency optimization. Contribution/Results: To our knowledge, this is the first systematic solution to the low-frequency exploration problem in linear bandits. We theoretically establish that logarithmic exploration frequency suffices to achieve the optimal instance-dependent regret bound. Empirical evaluations demonstrate that INFEX retains asymptotically optimal regret performance while substantially reducing computational overhead—achieving the state-of-the-art trade-off between regret and efficiency.

Technology Category

Application Category

📝 Abstract
We study the problem of infrequent exploration in linear bandits, addressing a significant yet overlooked gap between fully adaptive exploratory methods (e.g., UCB and Thompson Sampling), which explore potentially at every time step, and purely greedy approaches, which require stringent diversity assumptions to succeed. Continuous exploration can be impractical or unethical in safety-critical or costly domains, while purely greedy strategies typically fail without adequate contextual diversity. To bridge these extremes, we introduce a simple and practical framework, INFEX, explicitly designed for infrequent exploration. INFEX executes a base exploratory policy according to a given schedule while predominantly choosing greedy actions in between. Despite its simplicity, our theoretical analysis demonstrates that INFEX achieves instance-dependent regret matching standard provably efficient algorithms, provided the exploration frequency exceeds a logarithmic threshold. Additionally, INFEX is a general, modular framework that allows seamless integration of any fully adaptive exploration method, enabling wide applicability and ease of adoption. By restricting intensive exploratory computations to infrequent intervals, our approach can also enhance computational efficiency. Empirical evaluations confirm our theoretical findings, showing state-of-the-art regret performance and runtime improvements over existing methods.
Problem

Research questions and friction points this paper is trying to address.

Bridging infrequent and continuous exploration in bandits
Addressing impracticality of continuous exploration in critical domains
Enabling efficient exploration with minimal computational overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

INFEX framework enables infrequent exploration schedule
Modular design integrates any adaptive exploration method
Reduces computational load by limiting exploratory intervals