🤖 AI Summary
This paper studies the high-dimensional sparse linear bandit problem, aiming to design an adaptive algorithm that achieves optimal worst-case regret under both data-rich and data-poor regimes. To this end, we propose Sparse Optimistic Information-Directed Sampling (SOIDS), the first non-Bayesian framework that dynamically balances exploration and exploitation via a time-varying learning rate and a novel regret analysis technique. Theoretically, SOIDS attains tight, minimax-optimal regret bounds in both data regimes. Empirically, it demonstrates significant superiority over baselines on synthetic and real-world datasets. Our core contribution lies in breaking the reliance of conventional algorithms on a single data-generating assumption—achieving adaptive optimality across dual data mechanisms without prior knowledge of the regime.
📝 Abstract
Many high-dimensional online decision-making problems can be modeled as stochastic sparse linear bandits. Most existing algorithms are designed to achieve optimal worst-case regret in either the data-rich regime, where polynomial depen- dence on the ambient dimension is unavoidable, or the data-poor regime, where dimension-independence is possible at the cost of worse dependence on the num- ber of rounds. In contrast, the sparse Information Directed Sampling (IDS) algo- rithm satisfies a Bayesian regret bound that has the optimal rate in both regimes simultaneously. In this work, we explore the use of Sparse Optimistic Informa- tion Directed Sampling (SOIDS) to achieve the same adaptivity in the worst-case setting, without Bayesian assumptions. Through a novel analysis that enables the use of a time-dependent learning rate, we show that SOIDS can optimally balance information and regret. Our results extend the theoretical guarantees of IDS, pro- viding the first algorithm that simultaneously achieves optimal worst-case regret in both the data-rich and data-poor regimes. We empirically demonstrate the good performance of SOIDS.