🤖 AI Summary
To address the limited generalizability of recommender systems in cross-domain and cold-start scenarios—stemming from overreliance on user/item IDs—this paper proposes the first purely text-driven foundation model for sequential recommendation. The model eschews ID-based representations entirely, relying solely on textual features for item and user modeling. It introduces Finite Scalar Quantization (FSQ) as a unified tokenization mechanism to enable cross-domain semantic alignment. Furthermore, it incorporates a hybrid bidirectional-causal attention architecture and a catalog-aware beam search decoder to jointly optimize contextual understanding and structured sequence generation. The model supports zero-shot cross-domain transfer and instantaneous embedding of unseen items. Evaluated on six public benchmarks and industrial datasets, it consistently outperforms state-of-the-art methods, delivering significant gains in cold-start and cross-domain recommendation performance while enabling real-time, seamless integration of new items.
📝 Abstract
This work addresses a fundamental barrier in recommender systems: the inability to generalize across domains without extensive retraining. Traditional ID-based approaches fail entirely in cold-start and cross-domain scenarios where new users or items lack sufficient interaction history. Inspired by foundation models' cross-domain success, we develop a foundation model for sequential recommendation that achieves genuine zero-shot generalization capabilities. Our approach fundamentally departs from existing ID-based methods by deriving item representations exclusively from textual features. This enables immediate embedding of any new item without model retraining. We introduce unified item tokenization with Finite Scalar Quantization that transforms heterogeneous textual descriptions into standardized discrete tokens. This eliminates domain barriers that plague existing systems. Additionally, the framework features hybrid bidirectional-causal attention that captures both intra-item token coherence and inter-item sequential dependencies. An efficient catalog-aware beam search decoder enables real-time token-to-item mapping. Unlike conventional approaches confined to their training domains, RecGPT naturally bridges diverse recommendation contexts through its domain-invariant tokenization mechanism. Comprehensive evaluations across six datasets and industrial scenarios demonstrate consistent performance advantages.