Tagging the Thought: Unlocking Personalization Reasoning via Reinforcement Learning

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Large language models (LLMs) excel at general-purpose reasoning but struggle to model individual user preferences and generate personalized responses. To address this, we propose TagPR—a novel framework that introduces *thought tagging*, the first method to construct structured, interpretable personalized reasoning chains. TagPR further designs a multi-stage reinforcement learning mechanism integrating tag-based constraints with user embeddings, and proposes the PRMU reward model, which leverages composite reward signals for fine-grained alignment optimization. Evaluated on the LaMP benchmark and a newly curated dataset, TagPR achieves substantial improvements over state-of-the-art methods, with an average gain of 32.65%. These results empirically validate that structured reasoning is pivotal for enhancing personalization capability in LLMs.

Technology Category

Application Category

📝 Abstract

Recent advancements have endowed Large Language Models (LLMs) with impressive general reasoning capabilities, yet they often struggle with personalization reasoning - the crucial ability to analyze user history, infer unique preferences, and generate tailored responses. To address this limitation, we introduce TagPR, a novel training framework that significantly enhances an LLM's intrinsic capacity for personalization reasoning through a tagging the thought approach. Our method first develops a data-driven pipeline to automatically generate and semantically label reasoning chains, creating a structured dataset that fosters interpretable reasoning. We then propose a synergistic training strategy that begins with Supervised Fine-Tuning (SFT) on this tagged data to establish foundational reasoning patterns, followed by a multi-stage reinforcement learning (RL) process. This RL phase is guided by a unique composite reward signal, which integrates tag-based constraints and a novel Personalization Reward Model with User Embeddings (PRMU) to achieve fine-grained alignment with user-specific logic. Extensive experiments on the public LaMP benchmark and a self-constructed dataset demonstrate that our approach achieves state-of-the-art results, delivering an average improvement of 32.65% over the base model across all tasks. Our work validates that structured, interpretable reasoning is a highly effective pathway to unlocking genuine personalization capabilities in LLMs.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' personalization reasoning through structured tagging approach

Developing interpretable reasoning chains via supervised and reinforcement learning

Improving user-specific response alignment with composite reward signals

Innovation

Methods, ideas, or system contributions that make the work stand out.

TagPR framework enhances personalization reasoning via tagging

Multi-stage RL with composite reward for user alignment

Data-driven pipeline generates semantically labeled reasoning chains

🔎 Similar Papers

No similar papers found.