Learning User Preferences for Image Generation Model

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of modeling fine-grained, dynamic, and heterogeneous user aesthetic preferences—spanning color, style, subject, composition, and other multi-level attributes—in personalized image generation, overcoming limitations of conventional approaches that rely on static user profiles or population-level average preferences. We propose a multimodal large language model–based framework for personalized preference learning. Our method introduces a contrastive preference loss and learnable user preference tokens to jointly model individual specificity and shared group patterns, enabling end-to-end learning of fine-grained preference representations from historical interaction data. Experiments demonstrate that our approach achieves significantly higher preference prediction accuracy than state-of-the-art baselines, enables high-fidelity clustering of aesthetically similar users, and substantially improves alignment between generated images and individual user preferences.

Technology Category

Application Category

📝 Abstract
User preference prediction requires a comprehensive and accurate understanding of individual tastes. This includes both surface-level attributes, such as color and style, and deeper content-related aspects, such as themes and composition. However, existing methods typically rely on general human preferences or assume static user profiles, often neglecting individual variability and the dynamic, multifaceted nature of personal taste. To address these limitations, we propose an approach built upon Multimodal Large Language Models, introducing contrastive preference loss and preference tokens to learn personalized user preferences from historical interactions. The contrastive preference loss is designed to effectively distinguish between user ''likes'' and ''dislikes'', while the learnable preference tokens capture shared interest representations among existing users, enabling the model to activate group-specific preferences and enhance consistency across similar users. Extensive experiments demonstrate our model outperforms other methods in preference prediction accuracy, effectively identifying users with similar aesthetic inclinations and providing more precise guidance for generating images that align with individual tastes. The project page is exttt{https://learn-user-pref.github.io/}.
Problem

Research questions and friction points this paper is trying to address.

Predicting dynamic user preferences for image generation
Capturing multifaceted individual tastes accurately
Enhancing consistency in personalized image generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Large Language Models for preference learning
Contrastive preference loss distinguishes likes and dislikes
Learnable preference tokens capture shared user interests
🔎 Similar Papers
No similar papers found.
Wenyi Mo
Wenyi Mo
Rutgers University
Deep LearningVision-Language ModelGenerative Model
Y
Ying Ba
Renmin University of China
T
Tianyu Zhang
iN2X
Y
Yalong Bai
iN2X
B
Biye Li
iN2X