Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards

📅 2026-02-28

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work proposes ARC, a framework designed to enhance the alignment of text-to-image generation models with human preferences, factual accuracy, and aesthetic quality—without relying on external reward signals. ARC introduces an endogenous confidence score derived from the model’s intrinsic ability to denoise its own noisy reconstructions, which serves as an unsupervised scalar reward to guide reinforcement learning-based post-training. By eliminating the need for external reward models, annotated data, or additional datasets, ARC effectively mitigates reward hacking. Experimental results demonstrate consistent improvements across compositional generation, text rendering, and image-text alignment tasks. Moreover, ARC can be seamlessly integrated with external reward mechanisms, further boosting performance while alleviating reward exploitation issues.

Technology Category

Application Category

📝 Abstract

Text-to-image generation powers content creation across design, media, and data augmentation. Post-training of text-to-image generative models is a promising path to better match human preferences, factuality, and improved aesthetics. We introduce ARC (Adaptive Rewarding by self-Confidence), a post-training framework that replaces external reward supervision with an internal self-confidence signal, obtained by evaluating how accurately the model recovers injected noise under self-denoising probes. ARC converts this intrinsic signal into scalar rewards, enabling fully unsupervised optimization without additional datasets, annotators, or reward models. Empirically, by reinforcing high-confidence generations, ARC delivers consistent gains in compositional generation, text rendering and text-image alignment over the baseline. We also find that integrating ARC with external rewards results in a complementary improvement, with alleviated reward hacking.

Problem

Research questions and friction points this paper is trying to address.

text-to-image generation

post-training

reward learning

unsupervised optimization

text-image alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-confidence reward

unsupervised post-training

text-to-image generation