🤖 AI Summary
This work addresses the challenges of sparse and ambiguous user feedback, as well as the poor calibration, order sensitivity, and popularity bias inherent in direct large language model (LLM) ranking for personalized recommendation. To overcome these issues, the authors propose a novel framework that integrates heterogeneous knowledge graphs, maximum-entropy inverse reinforcement learning (IRL), and profile-guided LLM reranking. The approach first models individual and community preferences via graph retrieval and trains an IRL model to produce robust initial rankings; it then leverages user profiles to guide an LLM in semantically enhancing the reranking of short candidate lists. This is the first method to synergistically combine GraphRAG, IRL, and LLMs, achieving superadditive gains and substantially mitigating the reliability limitations of standalone LLM ranking. Experiments show NDCG@10 improvements of 15.7% and 16.6% on MovieLens and KuaiRand, respectively, with further gains of 16.8% on MovieLens-1M and consistent 4–6% improvements on KuaiRand after LLM-based reranking.
📝 Abstract
Personalized recommendation requires models that capture sequential user preferences while remaining robust to sparse feedback and semantic ambiguity. Recent work has explored large language models (LLMs) as recommenders and re-rankers, but pure prompt-based ranking often suffers from poor calibration, sensitivity to candidate ordering, and popularity bias. These limitations make LLMs useful semantic reasoners, but unreliable as standalone ranking engines.
We present \textbf{GraphRAG-IRL}, a hybrid recommendation framework that combines graph-grounded feature construction, inverse reinforcement learning (IRL), and persona-guided LLM re-ranking. Our method constructs a heterogeneous knowledge graph over items, categories, and concepts, retrieves both individual and community preference context, and uses these signals to train a Maximum Entropy IRL model for calibrated pre-ranking. An LLM is then applied only to a short candidate list, where persona-guided prompts provide complementary semantic judgments that are fused with IRL rankings.
Experiments show that GraphRAG-IRL is a strong standalone recommender: IRL-MLP with GraphRAG improves NDCG@10 by 15.7\% on MovieLens and 16.6\% on KuaiRand over supervised baselines. The results also show that IRL and GraphRAG are superadditive, with the combined gain exceeding the sum of their individual improvements. Persona-guided LLM fusion further improves ranking quality, yielding up to 16.8\% NDCG@10 improvement over the IRL-only baseline on MovieLens ml-1m, while score fusion on KuaiRand provides consistent gains of 4--6\% across LLM providers.