🤖 AI Summary
To address the low computational efficiency and limited candidate pool (only hundreds of item-user pairs) of the “text-to-judgment” paradigm in billion-scale cold-start recommendation, this paper proposes a novel “text-to-distribution” paradigm: a single LLM inference directly predicts the interaction probability distribution of an item across the entire user population. Key contributions include: (1) the first scalable user vocabulary (user-vocabulary) structure enabling end-to-end training and storage of billion-scale user embeddings; and (2) a joint optimization objective that simultaneously learns distribution prediction and vocabulary construction. Deployed on Alibaba’s platform for two months, the system serves over 1 billion cold-start items daily. Compared to state-of-the-art methods, it achieves a 30× speedup in inference latency, while A/B testing demonstrates statistically significant improvements in CTR and CVR.
📝 Abstract
Large Language Model (LLM)-based cold-start recommendation systems continue to face significant computational challenges in billion-scale scenarios, as they follow a"Text-to-Judgment"paradigm. This approach processes user-item content pairs as input and evaluates each pair iteratively. To maintain efficiency, existing methods rely on pre-filtering a small candidate pool of user-item pairs. However, this severely limits the inferential capabilities of LLMs by reducing their scope to only a few hundred pre-filtered candidates. To overcome this limitation, we propose a novel"Text-to-Distribution"paradigm, which predicts an item's interaction probability distribution for the entire user set in a single inference. Specifically, we present FilterLLM, a framework that extends the next-word prediction capabilities of LLMs to billion-scale filtering tasks. FilterLLM first introduces a tailored distribution prediction and cold-start framework. Next, FilterLLM incorporates an efficient user-vocabulary structure to train and store the embeddings of billion-scale users. Finally, we detail the training objectives for both distribution prediction and user-vocabulary construction. The proposed framework has been deployed on the Alibaba platform, where it has been serving cold-start recommendations for two months, processing over one billion cold items. Extensive experiments demonstrate that FilterLLM significantly outperforms state-of-the-art methods in cold-start recommendation tasks, achieving over 30 times higher efficiency. Furthermore, an online A/B test validates its effectiveness in billion-scale recommendation systems.