Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation

πŸ“… 2025-10-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing reinforcement learning (RL) research for text-to-image (T2I) generation focuses predominantly on diffusion or autoregressive models, overlooking mask-based generative modelsβ€”an efficient and promising paradigm. This work pioneers the integration of RL into mask-based T2I generation. We propose Mask-GRPO, a novel framework that formalizes iterative unmasking as a multi-step sequential decision-making process. Crucially, we adapt Group Relative Policy Optimization (GRPO) to mask modeling by redefining state transition probabilities, incorporating a KL-divergence constraint to stabilize training, applying latent-space dimensionality reduction for efficiency, and introducing a low-quality sample filtering mechanism. Evaluated on standard T2I benchmarks, Mask-GRPO achieves significant improvements in both image fidelity and human preference alignment, outperforming existing RL-based and supervised baselines. The implementation is publicly available.

Technology Category

Application Category

πŸ“ Abstract
Reinforcement learning (RL) has garnered increasing attention in text-to-image (T2I) generation. However, most existing RL approaches are tailored to either diffusion models or autoregressive models, overlooking an important alternative: masked generative models. In this work, we propose Mask-GRPO, the first method to incorporate Group Relative Policy Optimization (GRPO)-based RL into this overlooked paradigm. Our core insight is to redefine the transition probability, which is different from current approaches, and formulate the unmasking process as a multi-step decision-making problem. To further enhance our method, we explore several useful strategies, including removing the KL constraint, applying the reduction strategy, and filtering out low-quality samples. Using Mask-GRPO, we improve a base model, Show-o, with substantial improvements on standard T2I benchmarks and preference alignment, outperforming existing state-of-the-art approaches. The code is available on https://github.com/xingzhejun/Mask-GRPO
Problem

Research questions and friction points this paper is trying to address.

Applying reinforcement learning to masked generative models for text-to-image generation
Reformulating unmasking process as multi-step decision-making problem
Improving base model performance on benchmarks and preference alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Group Relative Policy Optimization for masked models
Redefines transition probability as multi-step decision problem
Enhances base model with constraint removal and filtering strategies
πŸ”Ž Similar Papers
No similar papers found.