π€ AI Summary
Current large language models (LLMs) are predominantly aligned via population-level preference modeling, rendering them ill-suited to individual usersβ expressive strategies, personality traits, and writing styles.
Method: We propose a hypothesis-driven, interpretable paradigm for personalized alignment that infers user-specific stylistic and personality hypotheses from minimal user samples, then enables zero-parameter-update, fine-grained personalization via meta-prompt engineering, few-shot inference, and context-augmented prompting.
Contribution/Results: Our approach eliminates conventional aggregation-based fine-tuning. In collaborative alignment tasks, it improves helpfulness by 70%; in authorship attribution, it achieves over 90% win rate against state-of-the-art fine-tuning methods. Moreover, it demonstrates robust cross-domain and cross-model generalization.
π Abstract
Alignment algorithms are widely used to align large language models (LLMs) to human users based on preference annotations that reflect their intended real-world use cases. Typically these (often divergent) preferences are aggregated over a diverse set of users, resulting in fine-tuned models that are aligned to the ``average-user'' preference. Nevertheless, current models are used by individual users in very specific contexts and situations, emphasizing the need for user-dependent preference control. In this work we address the problem of personalizing LLM outputs to their users, aiming to generate customized responses tailored to individual users, instead of generic outputs that emulate the collective voices of diverse populations. We propose a novel interpretable and sample-efficient hypotheses-driven personalization approach (HyPerAlign) where given few-shot examples written by a particular user, we first infer hypotheses about their communication strategies, personality and writing style, then prompt LLM models with these hypotheses and user specific attributes to generate customized outputs. We conduct experiments on two different personalization tasks, authorship attribution and deliberative alignment, with datasets from diverse domains (news articles, blog posts, emails, jailbreaking benchmarks), and demonstrate the superiority of hypotheses-driven personalization approach when compared to preference-based fine-tuning methods. For deliberative alignment, the helpfulness of LLM models is improved by up to $70%$ on average. For authorship attribution, results indicate consistently high win-rates (commonly $>90%$) against state-of-the-art preference fine-tuning approaches for LLM personalization across diverse user profiles and LLM models. Overall, our approach represents an interpretable and sample-efficient strategy for the personalization of LLM models to individual users.