🤖 AI Summary
Current large language models typically represent user preferences in an implicit, black-box, and model-specific manner, lacking interpretability and cross-task transferability. This work proposes using natural language as a universal, task-agnostic preference interface to construct interpretable, reusable, and continuously evolving user preference descriptions. By integrating supervised fine-tuning with high-quality synthetic data and reinforcement learning, we design a two-stage training framework that optimizes long-term utility and transfer performance, yielding the AlignXplore+ preference reasoning model. Our method achieves state-of-the-art results across nine benchmarks, with an 8B-parameter model outperforming larger open-source counterparts, demonstrating exceptional generalization across tasks, model families, and interaction formats.
📝 Abstract
We study the problem of personalization in large language models (LLMs). Prior work predominantly represents user preferences as implicit, model-specific vectors or parameters, yielding opaque ``black-box''profiles that are difficult to interpret and transfer across models and tasks. In contrast, we advocate natural language as a universal, model- and task-agnostic interface for preference representation. The formulation leads to interpretable and reusable preference descriptions, while naturally supporting continual evolution as new interactions are observed. To learn such representations, we introduce a two-stage training framework that combines supervised fine-tuning on high-quality synthesized data with reinforcement learning to optimize long-term utility and cross-task transferability. Based on this framework, we develop AlignXplore+, a universal preference reasoning model that generates textual preference summaries. Experiments on nine benchmarks show that our 8B model achieves state-of-the-art performanc -- outperforming substantially larger open-source models -- while exhibiting strong transferability across tasks, model families, and interaction formats.