Real-Time Personalization with Simple Transformers

📅 2025-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the tension between high computational cost and insufficient behavioral modeling capability of Transformer models in real-time personalized recommendation. We propose a lightweight and efficient “Simple Transformer” architecture featuring only a single self-attention layer; we provide the first theoretical proof that this minimal design can exactly capture complex user preferences. Furthermore, we devise a sublinear-time near-optimal recommendation optimization algorithm, drastically reducing inference latency. Evaluated on Spotify and Trivago datasets, our method achieves recommendation accuracy surpassing conventional embedding-based models and matching deep Transformers, while satisfying millisecond-level real-time response requirements. Our core contributions are threefold: (1) establishing the sufficiency of single-layer self-attention for expressive user behavior modeling; (2) unifying high accuracy with high efficiency; and (3) introducing the first provably efficient, Transformer-based lightweight paradigm specifically designed for real-time recommendation.

Technology Category

Application Category

📝 Abstract
Real-time personalization has advanced significantly in recent years, with platforms utilizing machine learning models to predict user preferences based on rich behavioral data on each individual user. Traditional approaches usually rely on embedding-based machine learning models to capture user preferences, and then reduce the final optimization task to nearest-neighbors, which can be performed extremely fast. However, these models struggle to capture complex user behaviors, which are essential for making accurate recommendations. Transformer-based models, on the other hand, are known for their practical ability to model sequential behaviors, and hence have been intensively used in personalization recently to overcome these limitations. However, optimizing recommendations under transformer-based models is challenging due to their complicated architectures. In this paper, we address this challenge by considering a specific class of transformers, showing its ability to represent complex user preferences, and developing efficient algorithms for real-time personalization. We focus on a particular set of transformers, called simple transformers, which contain a single self-attention layer. We show that simple transformers are capable of capturing complex user preferences. We then develop an algorithm that enables fast optimization of recommendation tasks based on simple transformers. Our algorithm achieves near-optimal performance in sub-linear time. Finally, we demonstrate the effectiveness of our approach through an empirical study on datasets from Spotify and Trivago. Our experiment results show that (1) simple transformers can model/predict user preferences substantially more accurately than non-transformer models and nearly as accurately as more complex transformers, and (2) our algorithm completes simple-transformer-based recommendation tasks quickly and effectively.
Problem

Research questions and friction points this paper is trying to address.

Overcome limitations of traditional embedding-based models in capturing complex user behaviors.
Develop efficient algorithms for real-time personalization using simple transformers.
Achieve near-optimal recommendation performance with sub-linear time complexity.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simple transformers model complex user preferences.
Efficient algorithm optimizes real-time recommendations quickly.
Single self-attention layer enhances personalization accuracy.
🔎 Similar Papers
No similar papers found.