LLM-Enhanced Reinforcement Learning for Long-Term User Satisfaction in Interactive Recommendation

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work addresses the challenge of content homogenization and filter bubbles in interactive recommendation systems, which often arise from overfitting to users’ short-term preferences and hinder the modeling of long-term interest evolution. To this end, the authors propose a novel hierarchical recommendation framework that uniquely integrates large language models (LLMs) with hierarchical reinforcement learning. In this architecture, a high-level policy leverages an LLM for semantic category planning, while a low-level policy employs reinforcement learning to deliver fine-grained personalized recommendations, jointly optimizing long-term user satisfaction. By constructing a semantic-driven action space, the approach effectively mitigates action sparsity and content redundancy. Extensive experiments on real-world datasets demonstrate that the proposed framework significantly outperforms existing methods, achieving superior performance in both long-term user satisfaction and recommendation diversity.

Technology Category

Application Category

📝 Abstract

Interactive recommender systems can dynamically adapt to user feedback, but often suffer from content homogeneity and filter bubble effects due to overfitting short-term user preferences. While recent efforts aim to improve content diversity, they predominantly operate in static or one-shot settings, neglecting the long-term evolution of user interests. Reinforcement learning provides a principled framework for optimizing long-term user satisfaction by modeling sequential decision-making processes. However, its application in recommendation is hindered by sparse, long-tailed user-item interactions and limited semantic planning capabilities. In this work, we propose LLM-Enhanced Reinforcement Learning (LERL), a novel hierarchical recommendation framework that integrates the semantic planning power of LLM with the fine-grained adaptability of RL. LERL consists of a high-level LLM-based planner that selects semantically diverse content categories, and a low-level RL policy that recommends personalized items within the selected semantic space. This hierarchical design narrows the action space, enhances planning efficiency, and mitigates overexposure to redundant content. Extensive experiments on real-world datasets demonstrate that LERL significantly improves long-term user satisfaction when compared with state-of-the-art baselines. The implementation of LERL is available at https://github.com/1163710212/LERL.

Problem

Research questions and friction points this paper is trying to address.

interactive recommendation

long-term user satisfaction

content homogeneity

filter bubble

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-enhanced reinforcement learning

long-term user satisfaction

interactive recommendation