Bias at the End of the Score

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Reward models are widely employed to evaluate the quality of text-to-image generation systems, yet their fairness and robustness remain insufficiently validated, potentially embedding demographic biases. This work presents the first large-scale audit of mainstream reward models, integrating quantitative and qualitative analyses across critical stages of the generative pipeline—including data filtering, optimization guidance, and post-processing. The study reveals that these models systematically encode and amplify gender and racial biases under ostensibly neutral scoring criteria, leading to the hypersexualization of female representations, reinforcement of stereotypes, and a significant reduction in demographic diversity within generated content. These findings fundamentally challenge the assumption that reward models serve as reliable and impartial evaluation metrics in generative AI systems.

Technology Category

Application Category

📝 Abstract

Reward models (RMs) are inherently non-neutral value functions designed and trained to encode specific objectives, such as human preferences or text-image alignment. RMs have become crucial components of text-to-image (T2I) generation systems where they are used at various stages for dataset filtering, as evaluation metrics, as a supervisory signal during optimization of parameters, and for post-generation safety and quality filtering of T2I outputs. While specific problems with the integration of RMs into the T2I pipeline have been studied (e.g. reward hacking or mode collapse), their robustness and fairness as scoring functions remains largely unknown. We conduct a large scale audit of RM robustness with respect to demographic biases during T2I model training and generation. We provide quantitative and qualitative evidence that while originally developed as quality measures, RMs encode demographic biases, which cause reward-guided optimization to disproportionately sexualize female image subjects reinforce gender/racial stereotypes, and collapse demographic diversity. These findings highlight shortcomings in current reward models, challenge their reliability as quality metrics, and underscore the need for improved data collection and training procedures to enable more robust scoring.

Problem

Research questions and friction points this paper is trying to address.

reward models

demographic bias

text-to-image generation

fairness

robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

reward models

demographic bias

text-to-image generation