Personalized Reward Modeling for Text-to-Image Generation

📅 2025-11-21

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Current text-to-image (T2I) models lack precise assessment capabilities for individual users’ visual preferences, as generic evaluation metrics fail to capture preference diversity. To address this, we propose PIGReward—the first user-conditioned, interpretable reward model for personalized T2I evaluation. Its core innovations are: (1) chain-of-thought (CoT) reasoning to dynamically generate user-specific evaluation dimensions, and (2) a bootstrapping strategy that constructs rich user context from sparse interaction signals—enabling personalization without user-specific fine-tuning. To rigorously evaluate such models, we introduce PIGBench, the first fine-grained benchmark for user preference modeling in T2I. Extensive experiments demonstrate that PIGReward significantly outperforms existing methods across diverse scenarios, markedly improving alignment between generated images and individual intent. Moreover, it effectively supports feedback-driven prompt optimization, advancing the development of personalized, interactive T2I systems.

Technology Category

Application Category

📝 Abstract

Recent text-to-image (T2I) models generate semantically coherent images from textual prompts, yet evaluating how well they align with individual user preferences remains an open challenge. Conventional evaluation methods, general reward functions or similarity-based metrics, fail to capture the diversity and complexity of personal visual tastes. In this work, we present PIGReward, a personalized reward model that dynamically generates user-conditioned evaluation dimensions and assesses images through CoT reasoning. To address the scarcity of user data, PIGReward adopt a self-bootstrapping strategy that reasons over limited reference data to construct rich user contexts, enabling personalization without user-specific training. Beyond evaluation, PIGReward provides personalized feedback that drives user-specific prompt optimization, improving alignment between generated images and individual intent. We further introduce PIGBench, a per-user preference benchmark capturing diverse visual interpretations of shared prompts. Extensive experiments demonstrate that PIGReward surpasses existing methods in both accuracy and interpretability, establishing a scalable and reasoning-based foundation for personalized T2I evaluation and optimization. Taken together, our findings highlight PIGReward as a robust steptoward individually aligned T2I generation.

Problem

Research questions and friction points this paper is trying to address.

Evaluating text-to-image model alignment with individual user preferences

Capturing personal visual tastes beyond conventional similarity metrics

Providing personalized feedback for user-specific prompt optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Personalized reward model with dynamic user-conditioned evaluation

Self-bootstrapping strategy using limited reference data

CoT reasoning for personalized feedback and optimization

🔎 Similar Papers

No similar papers found.