🤖 AI Summary
This work addresses the inherent tension in large language models between generating personalized responses and maintaining factual accuracy, where excessive personalization often compromises response correctness. To reconcile this conflict, the authors propose PersonaDual, a novel framework that unifies general objective reasoning and personalized reasoning within a single model for the first time. PersonaDual employs supervised fine-tuning to train dual reasoning pathways and introduces DualGRPO, a new reinforcement learning algorithm, to optimize an adaptive mechanism that dynamically selects the optimal reasoning path. Experimental results demonstrate that PersonaDual significantly mitigates interference between objective and personalized tasks, approaching the performance upper bound achievable under no interference, while effectively leveraging beneficial personalization signals to enhance factual question answering.
📝 Abstract
As users increasingly expect LLMs to align with their preferences, personalized information becomes valuable. However, personalized information can be a double-edged sword: it can improve interaction but may compromise objectivity and factual correctness, especially when it is misaligned with the question. To alleviate this problem, we propose PersonaDual, a framework that supports both general-purpose objective reasoning and personalized reasoning in a single model, and adaptively switches modes based on context. PersonaDual is first trained with SFT to learn two reasoning patterns, and then further optimized via reinforcement learning with our proposed DualGRPO to improve mode selection. Experiments on objective and personalized benchmarks show that PersonaDual preserves the benefits of personalization while reducing interference, achieving near interference-free performance and better leveraging helpful personalized signals to improve objective problem-solving.