DyCoRM: Dynamic Criterion-Aware Reward Modeling for Text-to-Image Generation

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing reward models struggle to accommodate users’ dynamic and fine-grained evaluation requirements for text-to-image generation. To address this limitation, this work proposes a criterion-aware dynamic reward modeling approach that enhances alignment between generated images and user intent by incorporating task-specific evaluation criteria and performing preference learning at the criterion level. We introduce the first comprehensive framework enabling dynamic, fine-grained assessment, comprising a novel dataset (DyCoDataset-20K), an evaluation benchmark (DyCoBench-1K), and a method named DyCoPick. By integrating dynamic criterion embeddings with alignment techniques and training on large-scale human-annotated data, our reward model achieves significantly superior performance on the new benchmark compared to existing approaches, effectively improving user satisfaction under personalized requirements.

📝 Abstract

With the continued advancement of text-to-image (T2I) generation, producing high-quality images is becoming increasingly attainable; consequently, user demands are shifting toward images that better satisfy their specific requirements. As reward models play an increasingly important role in assessing whether generated images align with user preference, this trend introduces an important challenge for reward modeling: rather than relying solely on static and general evaluation dimensions, reward models should account for the task-relevant and fine-grained criteria through which users assess whether generated images meet their specific requirements. To address this challenge, we propose DyCoRM, a dynamic, criterion-aware reward model that grounds task-relevant criteria and performs criterion-aware preference comparison. To support this setting, we construct DyCoDataset-20K, which provides dynamic criteria together with criterion-level annotations, and further derive DyCoBench-1K, a benchmark for systematically evaluating reward models under dynamic criteria. We further introduce DyCoPick, which applies criterion-aware reward modeling to selecting T2I images. Our contributions establish the first reward modeling framework for dynamic and fine-grained evaluation and practical application in T2I generation.

Problem

Research questions and friction points this paper is trying to address.

reward modeling

text-to-image generation

dynamic criteria

user preference

fine-grained evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic reward modeling

criterion-aware evaluation

text-to-image generation