Personalizing Text-to-Image Generation to Individual Taste

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing text-to-image generation models typically optimize for average population-level aesthetics, failing to capture individual users’ subjective preferences. To address this limitation, this work presents the first systematic modeling of individual aesthetic variation by introducing PAMELA, a personalized image evaluation dataset comprising 70,000 user ratings. The authors propose a joint personalized reward model that integrates high-quality human annotations with existing aesthetic data, trained jointly with prompt optimization and multi-user rating signals. This approach achieves higher accuracy in predicting individual preferences than most current state-of-the-art models attain on population-level preference tasks, substantially improving the personalization fidelity of generated images. The dataset and model are publicly released to support further research in personalized generative modeling.

Technology Category

Application Category

📝 Abstract

Modern text-to-image (T2I) models generate high-fidelity visuals but remain indifferent to individual user preferences. While existing reward models optimize for "average" human appeal, they fail to capture the inherent subjectivity of aesthetic judgment. In this work, we introduce a novel dataset and predictive framework, called PAMELA, designed to model personalized image evaluations. Our dataset comprises 70,000 ratings across 5,000 diverse images generated by state-of-the-art models (Flux 2 and Nano Banana). Each image is evaluated by 15 unique users, providing a rich distribution of subjective preferences across domains such as art, design, fashion, and cinematic photography. Leveraging this data, we propose a personalized reward model trained jointly on our high-quality annotations and existing aesthetic assessment subsets. We demonstrate that our model predicts individual liking with higher accuracy than the majority of current state-of-the-art methods predict population-level preferences. Using our personalized predictor, we demonstrate how simple prompt optimization methods can be used to steer generations towards individual user preferences. Our results highlight the importance of data quality and personalization to handle the subjectivity of user preferences. We release our dataset and model to facilitate standardized research in personalized T2I alignment and subjective visual quality assessment.

Problem

Research questions and friction points this paper is trying to address.

personalization

text-to-image generation

subjective preference

aesthetic judgment

user preference

Innovation

Methods, ideas, or system contributions that make the work stand out.

personalized text-to-image generation

subjective preference modeling

reward model