Covariance-adapting algorithm for semi-bandits with application to sparse rewards

📅 2026-04-15

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the limitations of existing methods in stochastic combinatorial semi-bandits that rely on sub-Gaussian assumptions, which hinder practical applicability—particularly in sparse-reward settings. To overcome this, the paper proposes modeling rewards using the more general sub-exponential distribution family. It introduces an instance-dependent complexity measure based on the unknown reward covariance matrix, designs a covariance-adaptive algorithm, and establishes a novel lower bound centered on covariance structure. Theoretical analysis demonstrates that the proposed approach achieves a tight asymptotic regret bound. Empirical evaluations further confirm its effectiveness and practicality in sparse-reward recommendation scenarios, significantly broadening the applicability of combinatorial semi-bandit algorithms.

Technology Category

Application Category

📝 Abstract

We investigate stochastic combinatorial semi-bandits, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits). Typical distributions considered depend on specific parameter values, whose prior knowledge is required in theory but quite difficult to estimate in practice; an example is the commonly assumed sub-Gaussian family. We alleviate this issue by instead considering a new general family of sub-exponential distributions, which contains bounded and Gaussian ones. We prove a new lower bound on the expected regret on this family, that is parameterized by the unknown covariance matrix of outcomes, a tighter quantity than the sub-Gaussian matrix. We then construct an algorithm that uses covariance estimates, and provide a tight asymptotic analysis of the regret. Finally, we apply and extend our results to the family of sparse outcomes, which has applications in many recommender systems.

Problem

Research questions and friction points this paper is trying to address.

stochastic combinatorial semi-bandits

sub-exponential distributions

covariance matrix

sparse rewards

regret analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

covariance-adapting

sub-exponential distributions

combinatorial semi-bandits