🤖 AI Summary
Existing LLM personalization methods suffer from high computational overhead, strong data dependency, severe catastrophic forgetting, and limited capability in modeling multi-turn interactions and implicit user preferences; moreover, they lack evaluation benchmarks grounded in real user behavior. This paper frames personalization as a preference-driven local model editing task and proposes a parameter-efficient editing framework based on clustering of user preference representations, enabling precise, minimally disruptive model updates. Contributions include: (1) UPQA—the first real-user-query-driven preference QA benchmark—designed specifically for information retrieval–oriented personalization evaluation; and (2) empirical results demonstrating that our editing method outperforms full fine-tuning in both accuracy and efficiency on UPQA and multi-turn implicit reasoning tasks, while significantly surpassing prompt-engineering baselines.
📝 Abstract
Personalization is becoming indispensable for LLMs to align with individual user preferences and needs. Yet current approaches are often computationally expensive, data-intensive, susceptible to catastrophic forgetting, and prone to performance degradation in multi-turn interactions or when handling implicit queries. To address these challenges, we conceptualize personalization as a model editing task and introduce Personalization Editing, a framework that applies localized edits guided by clustered preference representations. This design enables precise preference-aligned updates while preserving overall model capabilities. In addition, existing personalization benchmarks frequently rely on persona-based dialogs between LLMs rather than user-LLM interactions, or focus primarily on stylistic imitation while neglecting information-seeking tasks that require accurate recall of user-specific preferences. We introduce User Preference Question Answering (UPQA), a short-answer QA dataset constructed from in-situ user queries with varying levels of difficulty. Unlike prior benchmarks, UPQA directly evaluates a model's ability to recall and apply specific user preferences. Across experimental settings, Personalization Editing achieves higher editing accuracy and greater computational efficiency than fine-tuning, while outperforming prompting-based baselines in multi-turn conversations and implicit preference questions settings.