LiteraryTaste: A Preference Dataset for Creative Writing Personalization

📅 2025-11-12

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Current large language models (LLMs) struggle to adapt to individual user preferences in creative writing, as their training typically homogenizes diverse preferences. Method: We introduce the first personalized dataset for creative writing—comprising 60 participants—with concurrently collected self-reported stated preferences and behaviorally grounded revealed preferences derived from pairwise short-text choices. We empirically identify systematic discrepancies between these preference modalities, demonstrating that revealed preferences are more predictive and thus better suited for modeling. We propose a Transformer-based individual preference modeling framework, augmented with LLM-driven interpretability analysis to characterize fine-grained preference heterogeneity. Contribution/Results: Our method achieves 75.8% accuracy in individual-level and 67.7% in group-level preference prediction. These results validate both the dataset’s utility and the feasibility of personalized preference modeling, establishing a new user-centered paradigm and foundational resource for preference-aware creative text generation with LLMs.

Technology Category

Application Category

📝 Abstract

People have different creative writing preferences, and large language models (LLMs) for these tasks can benefit from adapting to each user's preferences. However, these models are often trained over a dataset that considers varying personal tastes as a monolith. To facilitate developing personalized creative writing LLMs, we introduce LiteraryTaste, a dataset of reading preferences from 60 people, where each person: 1) self-reported their reading habits and tastes (stated preference), and 2) annotated their preferences over 100 pairs of short creative writing texts (revealed preference). With our dataset, we found that: 1) people diverge on creative writing preferences, 2) finetuning a transformer encoder could achieve 75.8% and 67.7% accuracy when modeling personal and collective revealed preferences, and 3) stated preferences had limited utility in modeling revealed preferences. With an LLM-driven interpretability pipeline, we analyzed how people's preferences vary. We hope our work serves as a cornerstone for personalizing creative writing technologies.

Problem

Research questions and friction points this paper is trying to address.

Creating personalized LLMs for diverse creative writing preferences

Addressing limitations of monolithic training datasets for personal tastes

Bridging the gap between stated and revealed reading preferences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Created LiteraryTaste dataset for personalized writing preferences

Fine-tuned transformer encoder to model individual preferences

Used LLM interpretability pipeline to analyze preference variations

🔎 Similar Papers

Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation