Sparse Optimistic Information Directed Sampling

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This paper studies the high-dimensional sparse linear bandit problem, aiming to design an adaptive algorithm that achieves optimal worst-case regret under both data-rich and data-poor regimes. To this end, we propose Sparse Optimistic Information-Directed Sampling (SOIDS), the first non-Bayesian framework that dynamically balances exploration and exploitation via a time-varying learning rate and a novel regret analysis technique. Theoretically, SOIDS attains tight, minimax-optimal regret bounds in both data regimes. Empirically, it demonstrates significant superiority over baselines on synthetic and real-world datasets. Our core contribution lies in breaking the reliance of conventional algorithms on a single data-generating assumption—achieving adaptive optimality across dual data mechanisms without prior knowledge of the regime.

Technology Category

Application Category

📝 Abstract

Many high-dimensional online decision-making problems can be modeled as stochastic sparse linear bandits. Most existing algorithms are designed to achieve optimal worst-case regret in either the data-rich regime, where polynomial depen- dence on the ambient dimension is unavoidable, or the data-poor regime, where dimension-independence is possible at the cost of worse dependence on the num- ber of rounds. In contrast, the sparse Information Directed Sampling (IDS) algo- rithm satisfies a Bayesian regret bound that has the optimal rate in both regimes simultaneously. In this work, we explore the use of Sparse Optimistic Informa- tion Directed Sampling (SOIDS) to achieve the same adaptivity in the worst-case setting, without Bayesian assumptions. Through a novel analysis that enables the use of a time-dependent learning rate, we show that SOIDS can optimally balance information and regret. Our results extend the theoretical guarantees of IDS, pro- viding the first algorithm that simultaneously achieves optimal worst-case regret in both the data-rich and data-poor regimes. We empirically demonstrate the good performance of SOIDS.

Problem

Research questions and friction points this paper is trying to address.

Achieving optimal worst-case regret in sparse linear bandits

Balancing information and regret without Bayesian assumptions

Adapting to both data-rich and data-poor regimes simultaneously

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Optimistic Information Directed Sampling algorithm

Time-dependent learning rate enables optimal balance

Achieves optimal worst-case regret in both regimes

🔎 Similar Papers

Multiple importance sampling for stochastic gradient estimation