TrackRec: Iterative Alternating Feedback with Chain-of-Thought via Preference Alignment for Recommendation

📅 2025-08-21

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Large language models (LLMs) deployed in recommender systems suffer from hallucination and preference modeling bias due to inaccurate chain-of-thought (CoT) reasoning. Method: This paper introduces Recommendation Chain-of-Thought (RecCoT), a generative–verificative alternating feedback framework: a generation module infers interpretable, behavior-grounded preference reasoning chains, while a verification module iteratively refines them via direct preference optimization (DPO). Contribution/Results: RecCoT is the first approach to explicitly align CoT generation with users’ true preferences *during* inference, thereby enhancing both recommendation accuracy and interpretability. Extensive deployment on an industrial-scale advertising platform (with billions of impressions) demonstrates statistically significant improvements over state-of-the-art baselines across core metrics—including CTR and CVR—while delivering substantial business impact.

Technology Category

Application Category

📝 Abstract

The extensive world knowledge and powerful reasoning capabilities of large language models (LLMs) have attracted significant attention in recommendation systems (RS). Specifically, The chain of thought (CoT) has been shown to improve the performance of LLMs on complex reasoning tasks for RS. However, due to the fact that LLMs often suffer from hallucination issues, there is no guarantee that their reasoning CoT is effective. A key challenge is to further enhance the recommendation capabilities of LLMs through effective CoT reasonings. Therefore, we propose extbf{TrackRec}, a framework designed to enhance reasoning capabilities of LLMs for RS. TrackRec specifically focuses on accurately inferring recommendation CoT extbf{(RecCoT)} for user preference using the knowledge from LLMs. This RecCoT can serve both as an explanation for the LLM's completion of recommendation tasks and as auxiliary features to assist recommendation models in accomplishing recommendation tasks. TrackRec consists of a RecCoT generator $(G)$ and a RecCoT validator $(V)$. Furthermore, we design alternating feedback learning mechanism that $G$ undergoes direct preference optimization via feedback from $V$ to produce increasingly accurate RecCoT aligned with $V$'s standards. Meanwhile, $V$ is fine-tuned using the inference feedback from $G$ to enhance its validation capabilities in alignment with recommendation tasks. Through iterative alternating feedback learning between $G$ and $V$, TrackRec continuously improves the user preference analysis capability of $G$ and the validation capacity of $V$. Extensive experiments demonstrate the effectiveness of our approach, showing that it surpasses state-of-the-art methods. Moreover, TrackRec has been deployed on a lagre advertising platform with hundreds of millions of users, achieving substantial gains.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM reasoning for recommendation systems via CoT

Addressing hallucination issues in LLM-generated recommendation reasoning

Aligning user preference inference with validation through iterative feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative alternating feedback mechanism between generator and validator

Direct preference optimization for accurate reasoning chain generation

Chain-of-Thought alignment with user preferences for recommendations

🔎 Similar Papers

End-to-End Learnable Item Tokenization for Generative Recommendation