WikiPersonas: What Can We Learn From Personalized Alignment to Famous People?

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Existing personalized alignment research is hindered by coarse-grained preference modeling and the absence of fine-grained, individual-level preference datasets. To address this, we introduce WikiPersona—the first fine-grained, real-world personalized alignment benchmark built upon extensively documented public figures. It requires models not only to generate responses aligned with a given persona’s preferences but also to produce verifiable background and preference descriptions, enabling interpretable alignment. Innovatively treating personas as prototypical individuals, we propose Preference Prefix Injection: a fine-tuning strategy that conditions generation on inferred individual preferences as prefix tokens. This yields a 12.3% improvement in alignment accuracy under preference conflict scenarios and significantly enhances cross-persona generalization and fairness. Comprehensive experiments—including few-shot prompting, supervised fine-tuning, and multi-dimensional evaluation—demonstrate superior verifiability, robustness, and personalized consistency over baselines.

Technology Category

Application Category

📝 Abstract

Preference alignment has become a standard pipeline in finetuning models to follow emph{generic} human preferences. Majority of work seeks to optimize model to produce responses that would be preferable emph{on average}, simplifying the diverse and often emph{contradicting} space of human preferences. While research has increasingly focused on personalized alignment: adapting models to individual user preferences, there is a lack of personalized preference dataset which focus on nuanced individual-level preferences. To address this, we introduce WikiPersona: the first fine-grained personalization using well-documented, famous individuals. Our dataset challenges models to align with these personas through an interpretable process: generating verifiable textual descriptions of a persona's background and preferences in addition to alignment. We systematically evaluate different personalization approaches and find that as few-shot prompting with preferences and fine-tuning fail to simultaneously ensure effectiveness and efficiency, using extit{inferred personal preferences} as prefixes enables effective personalization, especially in topics where preferences clash while leading to more equitable generalization across unseen personas.

Problem

Research questions and friction points this paper is trying to address.

Lack of fine-grained personalized preference datasets for alignment

Challenges in aligning models with nuanced individual-level preferences

Ineffectiveness of few-shot prompting and fine-tuning for personalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces WikiPersona dataset for personalized alignment

Uses inferred personal preferences as prefixes

Focuses on famous individuals for fine-grained personalization

🔎 Similar Papers

No similar papers found.