🤖 AI Summary
This work addresses the limitations of traditional item-based collaborative filtering and two-tower models, which suffer from rigid truncation strategies and weak interaction modeling, hindering fine-grained user interest capture. The authors propose PI2I, a two-stage retrieval framework: in the first stage, a relaxed truncation threshold expands the candidate set to improve recall; in the second stage, an interactive scoring model replaces inner product computation, and negative samples are constructed from trigger–target item pairs to align training with online inference. By integrating flexible index construction with personalized interaction modeling, PI2I significantly enhances recommendation accuracy. Offline experiments demonstrate superior performance over classical collaborative filtering and parity with two-tower models. Deployed on Taobao’s “Guess You Like” feed, it achieves a 1.05% increase in transaction conversion rate and releases a public dataset containing 130 million interactions.
📝 Abstract
Efficiently selecting relevant content from vast candidate pools is a critical challenge in modern recommender systems. Traditional methods, such as item-to-item collaborative filtering (CF) and two-tower models, often fall short in capturing the complex user-item interactions due to uniform truncation strategies and overdue user-item crossing. To address these limitations, we propose Personalized Item-to-Item (PI2I), a novel two-stage retrieval framework that enhances the personalization capabilities of CF. In the first Indexer Building Stage (IBS), we optimize the retrieval pool by relaxing truncation thresholds to maximize Hit Rate, thereby temporarily retaining more items users might be interested in. In the second Personalized Retrieval Stage (PRS), we introduce an interactive scoring model to overcome the limitations of inner product calculations, allowing for richer modeling of intricate user-item interactions. Additionally, we construct negative samples based on the trigger-target (item-to-item) relationship, ensuring consistency between offline training and online inference. Offline experiments on large-scale real-world datasets demonstrate that PI2I outperforms traditional CF methods and rivals Two-Tower models. Deployed in the"Guess You Like"section on Taobao, PI2I achieved a 1.05% increase in online transaction rates. In addition, we have released a large-scale recommendation dataset collected from Taobao, containing 130 million real-world user interactions used in the experiments of this paper. The dataset is publicly available at https://huggingface.co/datasets/PI2I/PI2I, which could serve as a valuable benchmark for the research community.