LLM-CoT Enhanced Graph Neural Recommendation with Harmonized Group Policy Optimization

📅 2025-05-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address ID sparsity, underutilized textual information, false-negative contamination from random negative sampling, and non-adaptive temperature scaling in graph recommendation, this paper proposes an LLM-CoT–driven semantic ID generation mechanism that maps sparse IDs to dense, semantically rich textual representations. Furthermore, we design HGPO—a grouped collaborative reinforcement learning strategy based on a PPO variant—that jointly optimizes negative sampling and temperature coefficient selection in a node-adaptive manner, thereby enhancing long-tail performance while preserving inter-group consistency. Extensive experiments on three public benchmarks demonstrate that our method significantly outperforms state-of-the-art baselines in both recommendation accuracy (e.g., Recall@20, NDCG@20) and representation quality (e.g., clustering purity, semantic coherence). The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Graph neural networks (GNNs) have advanced recommender systems by modeling interaction relationships. However, existing graph-based recommenders rely on sparse ID features and do not fully exploit textual information, resulting in low information density within representations. Furthermore, graph contrastive learning faces challenges. Random negative sampling can introduce false negative samples, while fixed temperature coefficients cannot adapt to the heterogeneity of different nodes. In addition, current efforts to enhance recommendations with large language models (LLMs) have not fully utilized their Chain-of-Thought (CoT) reasoning capabilities to guide representation learning. To address these limitations, we introduces LGHRec (LLM-CoT Enhanced Graph Neural Recommendation with Harmonized Group Policy Optimization). This framework leverages the CoT reasoning ability of LLMs to generate semantic IDs, enriching reasoning processes and improving information density and semantic quality of representations. Moreover, we design a reinforcement learning algorithm, Harmonized Group Policy Optimization (HGPO), to optimize negative sampling strategies and temperature coefficients in contrastive learning. This approach enhances long-tail recommendation performance and ensures optimization consistency across different groups. Experimental results on three datasets demonstrate that LGHRec improves representation quality through semantic IDs generated by LLM's CoT reasoning and effectively boosts contrastive learning with HGPO. Our method outperforms several baseline models. The code is available at: https://anonymous.4open.science/r/LLM-Rec.
Problem

Research questions and friction points this paper is trying to address.

Enhances GNN-based recommenders with LLM semantic reasoning
Improves contrastive learning via adaptive negative sampling
Optimizes temperature coefficients for heterogeneous node representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages LLM's CoT reasoning for semantic IDs
Uses HGPO to optimize contrastive learning parameters
Enhances representation quality with semantic enrichment
🔎 Similar Papers
No similar papers found.
H
Hailong Luo
Zhengzhou University
B
Bin Wu
Zhengzhou University
H
Hongyong Jia
Zhengzhou University
Qingqing Zhu
Qingqing Zhu
nih
Lianlei Shan
Lianlei Shan
University of Chinese Academy of Science, Tsinghua University
SegmentationPerceptionVLM