🤖 AI Summary
Existing LLM alignment methods focus on static, single-turn, or universal value alignment, failing to model users’ long-term personalized preferences and address cold-start challenges. This paper proposes PersonalAgent—the first proactive dialogue agent that formalizes cross-session personalization as a sequential reasoning task. It integrates LLM-driven dialogue decomposition, temporal preference modeling, dynamic user profile updating, and reinforcement learning–based policy optimization to track preference evolution and mitigate cold start. Its key innovation lies in establishing the first cross-session consistent sequential alignment framework, inherently robust to dialogue noise. Experiments demonstrate significant improvements over prompt-based and policy-optimization baselines under both ideal and noisy dialogue conditions. Human evaluation confirms that PersonalAgent achieves state-of-the-art performance in naturalness and consistency of preference understanding.
📝 Abstract
The deployment of Large Language Models (LLMs) in interactive systems necessitates a deep alignment with the nuanced and dynamic preferences of individual users. Current alignment techniques predominantly address universal human values or static, single-turn preferences, thereby failing to address the critical needs of long-term personalization and the initial user cold-start problem. To bridge this gap, we propose PersonalAgent, a novel user-centric lifelong agent designed to continuously infer and adapt to user preferences. PersonalAgent constructs and dynamically refines a unified user profile by decomposing dialogues into single-turn interactions, framing preference inference as a sequential decision-making task. Experiments show that PersonalAgent achieves superior performance over strong prompt-based and policy optimization baselines, not only in idealized but also in noisy conversational contexts, while preserving cross-session preference consistency. Furthermore, human evaluation confirms that PersonalAgent excels at capturing user preferences naturally and coherently. Our findings underscore the importance of lifelong personalization for developing more inclusive and adaptive conversational agents. Our code is available here.