AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing text-to-image reward models rely heavily on large-scale human preference data, which is costly to collect and exhibits poor generalization; meanwhile, handcrafted vision-language scoring rules often fail to accurately capture human preferences. This work proposes AutoRubric-T2I, a novel framework that, for the first time, enables automatic learning of scoring rubrics in text-to-image generation. By synthesizing reasoning traces from a small number of preference pairs, the method distills candidate rules and leverages a vision-language model to produce rule-based scores. An ℓ₁-regularized logistic regression then selects the most discriminative subset of rules, yielding an interpretable and sample-efficient reward model. Remarkably, AutoRubric-T2I achieves high-quality reward signals using less than 0.01% of annotated data, outperforming strong baselines on benchmarks such as MMRB2 and significantly enhancing diffusion model performance on TIIF and UniGenBench++ generation tasks.

📝 Abstract

Aligning Text-to-Image (T2I) generation models with human preferences increasingly relies on image reward models that score or rank generated images according to prompt alignment and perceptual quality. Existing reward models are commonly trained as Bradley-Terry (BT) preference models on large-scale human preference corpora, making them costly to train, difficult to adapt, and opaque in their evaluation criteria. Meanwhile, Vision-Language Model (VLM) judges can provide more fine-grained assessments through textual rubrics, but their manually designed or heuristically generated scoring rules may fail to reliably reflect human preferences. In this paper, we propose AutoRubric-T2I, the first rubric learning framework in T2I that automatically synthesizes and selects explicit rubrics for guiding VLM judges. AutoRubric-T2I first synthesizes reasoning traces from preference pairs into candidate rubrics, then uses a VLM judge to score paired images under each rubric, producing pairwise rubric-score differences for preference learning. To remove noisy and redundant rules, we further employ a $\ell_1$-Regularized Logistic Regression Refiner, which selects the Top-$N$ most discriminative rubrics. Extensive evaluations show that AutoRubric-T2I produces high-quality, interpretable reward signals using less than 0.01% of the annotated preference data, substantially reducing the need for large-scale reward-model training. On image reward benchmarks such as MMRB2, AutoRubric-T2I outperforms strong reward model baselines. We further validate AutoRubric-T2I as an RL reward on downstream T2I tasks, including TIIF and UniGenBench++, where it improves generation quality over scalar reward models using the Flow-GRPO pipeline on diffusion models.

Problem

Research questions and friction points this paper is trying to address.

Text-to-Image Alignment

Reward Model

Human Preferences

Vision-Language Model

Rubric Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

rubric learning

text-to-image alignment

vision-language model