The Bandit's Blind Spot: The Critical Role of User State Representation in Recommender Systems

📅 2026-04-29

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

This study addresses a critical yet often overlooked aspect in contextual multi-armed bandit (CMAB) recommendation research: the quality of user state representations. The authors systematically evaluate the impact of various matrix factorization–based user embeddings and their aggregation strategies on the performance of classical CMAB algorithms. Through large-scale experiments on real-world datasets, they provide the first empirical evidence that improving user state representation can yield greater performance gains than algorithmic enhancements alone. Moreover, no single embedding or aggregation method universally outperforms others across all settings. The findings underscore that constructing high-quality, domain-adapted state representations is as crucial as algorithm design, offering a new perspective on the joint optimization of representation learning and online decision-making in recommender systems.

📝 Abstract

With the increasing availability of online information, recommender systems have become an important tool for many web-based systems. Due to the continuous aspect of recommendation environments, these systems increasingly rely on contextual multi-armed bandits (CMAB) to deliver personalized and real-time suggestions. A critical yet underexplored component in these systems is the representation of user state, which typically encapsulates the user's interaction history and is deeply correlated with the model's decisions and learning. In this paper, we investigate the impact of different embedding-based state representations derived from matrix factorization models on the performance of traditional CMAB algorithms. Our large-scale experiments reveal that variations in state representation can lead to improvements greater than those achieved by changing the bandit algorithm itself. Furthermore, no single embedding or aggregation strategy consistently dominates across datasets, underscoring the need for domain-specific evaluation. These results expose a substantial gap in the literature and emphasize that advancing bandit-based recommender systems requires a holistic approach that prioritizes embedding quality and state construction alongside algorithmic innovation. The source code for our experiments is publicly available on https://github.com/UFSCar-LaSID/bandits_blind_spot.

Problem

Research questions and friction points this paper is trying to address.

recommender systems

contextual multi-armed bandits

user state representation

embedding quality

matrix factorization

Innovation

Methods, ideas, or system contributions that make the work stand out.

user state representation

contextual multi-armed bandits

embedding quality