Selecting User Histories to Generate LLM Users for Cold-Start Item Recommendation

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inefficient historical data utilization in cold-start item recommendation (e.g., new or sparsely interacted items), this paper proposes an LLM-driven reinforcement learning framework. It introduces a novel user behavior simulator based on large language models (LLMs) to generate high-fidelity synthetic interaction data. A policy-gradient-based reinforcement learning module is designed to dynamically select the most information-rich subset of user historical interactions for data augmentation—overcoming limitations of random sampling and fixed-length history truncation. The method jointly optimizes semantic representation of user behavior and sequential decision-making for history selection. Evaluated on Amazon datasets, it achieves significant improvements in cold-start item recall (+12.7%) while maintaining inference efficiency and scalability. The core innovation lies in the co-optimization paradigm integrating LLM-based user simulation with adaptive historical interaction selection.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning, generalization, and simulating human-like behavior across a wide range of tasks. These strengths present new opportunities to enhance traditional recommendation systems (RS), especially in the cold-start item scenario where newly introduced items lack interactions. Existing works have used LLMs to address cold-start issues in traditional RS through data augmentation, but they have limitations. One recent work directly addresses this issue by prompting LLMs to generate augmented interaction data between randomly sampled users and cold-start items. Then, they train the traditional RS with augmented data, incorporating collaborative signals for cold-start items. Although they use LLMs to provide cold-start items with feedback, they use partial user histories, which does not allow the LLM to fully emulate the user. Furthermore, randomly selecting users is not optimal for augmentation. To address these challenges, we leverage the LLM as a user and develop a reinforcement learning (RL) framework that trains a policy to select users for augmentation, optimizing for cold-start item performance after augmented training. The policy model learns to select users for cold-start item data augmentation based on their behavioral features and histories. To optimize user selection for cold-start item performance, we employ a policy gradient method that updates the policy in the direction of actions that lead to high rewards. Experiments on Amazon Product Review datasets show substantial gains in cold-start item recall, demonstrating the effectiveness of our method as a scalable, serving-efficient augmentation strategy for modern RS.
Problem

Research questions and friction points this paper is trying to address.

Generates LLM-simulated users for cold-start item recommendations
Selects optimal user histories via reinforcement learning for data augmentation
Improves recall of new items lacking interaction data in recommendation systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM simulates users with full history for cold-start recommendations
Reinforcement learning policy selects optimal users for data augmentation
Policy gradient method optimizes user selection to maximize recommendation performance
🔎 Similar Papers
No similar papers found.
N
Nachiket Subbaraman
University of California, Davis
J
Jaskinder Sarai
University of California, Davis
Aniruddh Nath
Aniruddh Nath
Google, LLC
Machine LearningArtificial Intelligence
Lichan Hong
Lichan Hong
Google DeepMind
Recommendation SystemLLMDeep LearningSocial ComputingVisualization
L
Lukasz Heldt
Google Inc.
L
Li Wei
Google Inc.
Z
Zhe Zhao
University of California, Davis