🤖 AI Summary
Existing matrix contextual bandit approaches neglect graph-structured relationships between users and items, resulting in suboptimal policy learning efficiency. To address this, we propose the first matrix-bandit framework that jointly incorporates low-rank structure and graph priors: it introduces graph Laplacian regularization into matrix bandits for the first time, unifying nuclear norm minimization with a graph-based regularizer to model user/item similarities. We further design an efficient algorithm based on graph-basis generalized linear UCB. Theoretically, our method achieves a tighter cumulative regret bound than state-of-the-art approaches. Extensive experiments on synthetic data and multiple real-world recommendation benchmarks demonstrate significant performance gains. Our core innovation lies in the unified modeling of low-rankness and graph structure—thereby balancing expressive power and generalization capability—while enabling principled incorporation of relational side information into sequential decision-making under uncertainty.
📝 Abstract
The matrix contextual bandit (CB), as an extension of the well-known multi-armed bandit, is a powerful framework that has been widely applied in sequential decision-making scenarios involving low-rank structure. In many real-world scenarios, such as online advertising and recommender systems, additional graph information often exists beyond the low-rank structure, that is, the similar relationships among users/items can be naturally captured through the connectivity among nodes in the corresponding graphs. However, existing matrix CB methods fail to explore such graph information, and thereby making them difficult to generate effective decision-making policies. To fill in this void, we propose in this paper a novel matrix CB algorithmic framework that builds upon the classical upper confidence bound (UCB) framework. This new framework can effectively integrate both the low-rank structure and graph information in a unified manner. Specifically, it involves first solving a joint nuclear norm and matrix Laplacian regularization problem, followed by the implementation of a graph-based generalized linear version of the UCB algorithm. Rigorous theoretical analysis demonstrates that our procedure outperforms several popular alternatives in terms of cumulative regret bound, owing to the effective utilization of graph information. A series of synthetic and real-world data experiments are conducted to further illustrate the merits of our procedure.