Learning to Sparsify Stochastic Linear Bandits

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

This work presents the first systematic study of stochastic linear bandits under sparsity constraints, addressing the NP-hard optimization challenge arising from combinatorial structures in high-dimensional action spaces. The authors propose an adaptive phased exploration–exploitation framework that integrates ordinary least squares parameter estimation, a greedy sparse action selection subroutine, and adaptive phase scheduling to enable efficient decision-making across diverse action set geometries. Theoretical analysis establishes a regret bound of Õ(d√T) for Euclidean ball action sets, an α-regret of Õ(d√T) for strongly convex compact sets, and an α-regret of Õ(d T^{2/3}) for general compact sets. Empirical evaluation on recommendation system tasks demonstrates the algorithm’s practical effectiveness.

📝 Abstract

This paper addresses the problem of learning to sparsify stochastic linear bandits, where a decision-maker sequentially selects actions from a high-dimensional space subject to a sparsity constraint on the number of nonzero elements in the action vector. The key challenge lies in minimizing cumulative regret while tackling the potential NP-hardness of finding optimal sparse actions due to the inherent combinatorial structure of the problem. We propose an adaptively phased exploration and exploitation algorithmic framework, utilizing ordinary least squares for parameter learning and specialized subroutines for sparse action selection. When the action set is a Euclidean ball, optimal sparse actions can be efficiently computed, enabling us to establish a $\tilde{\mathcal{O}}(d\sqrt{T})$ regret, where $d$ is the dimension of the action vector and $T$ is the time horizon length. For general convex and compact action sets where finding optimal sparse actions is intractable, we employ a greedy subroutine. For general strongly convex action sets, we derive a $\tilde{\mathcal{O}}(d \sqrt{T})$ $α$-regret; for general compact sets lacking strong convexity, we establish a $\tilde{\mathcal{O}}(d T^{2/3})$ $α$-regret, where $α$ pertains to the approximation ratio of the greedy algorithm. Finally, we validate the performance of our algorithms using extensive experiments including an application to recommendation system.

Problem

Research questions and friction points this paper is trying to address.

stochastic linear bandits

sparsity constraint

combinatorial optimization

cumulative regret

high-dimensional action space

Innovation

Methods, ideas, or system contributions that make the work stand out.

sparse linear bandits

adaptive exploration-exploitation

combinatorial optimization