🤖 AI Summary
To address inefficient historical data utilization in cold-start item recommendation (e.g., new or sparsely interacted items), this paper proposes an LLM-driven reinforcement learning framework. It introduces a novel user behavior simulator based on large language models (LLMs) to generate high-fidelity synthetic interaction data. A policy-gradient-based reinforcement learning module is designed to dynamically select the most information-rich subset of user historical interactions for data augmentation—overcoming limitations of random sampling and fixed-length history truncation. The method jointly optimizes semantic representation of user behavior and sequential decision-making for history selection. Evaluated on Amazon datasets, it achieves significant improvements in cold-start item recall (+12.7%) while maintaining inference efficiency and scalability. The core innovation lies in the co-optimization paradigm integrating LLM-based user simulation with adaptive historical interaction selection.
📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning, generalization, and simulating human-like behavior across a wide range of tasks. These strengths present new opportunities to enhance traditional recommendation systems (RS), especially in the cold-start item scenario where newly introduced items lack interactions. Existing works have used LLMs to address cold-start issues in traditional RS through data augmentation, but they have limitations. One recent work directly addresses this issue by prompting LLMs to generate augmented interaction data between randomly sampled users and cold-start items. Then, they train the traditional RS with augmented data, incorporating collaborative signals for cold-start items. Although they use LLMs to provide cold-start items with feedback, they use partial user histories, which does not allow the LLM to fully emulate the user. Furthermore, randomly selecting users is not optimal for augmentation. To address these challenges, we leverage the LLM as a user and develop a reinforcement learning (RL) framework that trains a policy to select users for augmentation, optimizing for cold-start item performance after augmented training. The policy model learns to select users for cold-start item data augmentation based on their behavioral features and histories. To optimize user selection for cold-start item performance, we employ a policy gradient method that updates the policy in the direction of actions that lead to high rewards. Experiments on Amazon Product Review datasets show substantial gains in cold-start item recall, demonstrating the effectiveness of our method as a scalable, serving-efficient augmentation strategy for modern RS.