IRIS: Intrinsic Reward Image Synthesis

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of human preference data that limits Reinforcement Learning from Human Feedback (RLHF) in autoregressive text-to-image (T2I) generation, this paper introduces IRIS—the first framework to optimize autoregressive T2I models using solely intrinsic rewards. Its core innovation lies in the novel identification and exploitation of *self-uncertainty*—a naturally emerging signal during autoregressive decoding—as an unsupervised reinforcement learning signal. Specifically, IRIS constructs an intrinsic reward by quantifying the dynamic evolution of token-level prediction confidence throughout the sequential generation process, eliminating the need for external annotations, human feedback, or pretrained discriminators. Evaluated across multiple benchmarks, IRIS matches or surpasses state-of-the-art RLHF methods in image fidelity and text–image alignment, despite requiring no human preferences. This work establishes a new paradigm for T2I optimization under low-resource conditions.

Technology Category

Application Category

📝 Abstract
Despite the success of Reinforcement Learning from Human Feedback (RLHF) in language reasoning, its application to autoregressive Text-to-Image (T2I) generation is often constrained by the limited availability of human preference data. This paper explores how an autoregressive T2I model can learn from internal signals without relying on external rewards or labeled data. Contrary to recent findings in text generation, we show that maximizing self-uncertainty, rather than self-certainty, improves image generation. We observe that this is because autoregressive T2I models with low uncertainty tend to generate simple and uniform images, which are less aligned with human preferences. Based on these observations, we propose IRIS (Intrinsic Reward Image Synthesis), the first framework to improve autoregressive T2I models with reinforcement learning using only an intrinsic reward. Empirical results demonstrate that applying IRIS to autoregressive T2I models achieves performance that is competitive with or superior to external rewards.
Problem

Research questions and friction points this paper is trying to address.

Addressing limited human preference data in autoregressive text-to-image generation
Exploring self-uncertainty maximization to improve image generation quality
Developing intrinsic reward framework for reinforcement learning without external data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses intrinsic reward for reinforcement learning
Maximizes self-uncertainty to improve generation
Eliminates need for external human feedback
🔎 Similar Papers
No similar papers found.
Y
Yihang Chen
Computer Science Department, University of California, Los Angeles
Y
Yuanhao Ban
Computer Science Department, University of California, Los Angeles
Yunqi Hong
Yunqi Hong
University of California, Los Angeles
LLM post-trainingMultimodal LLM
Cho-Jui Hsieh
Cho-Jui Hsieh
University of California, Los Angeles
Machine LearningOptimization