From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This work addresses the limitations of standard Group Relative Policy Optimization (GRPO) in text-to-image generation, where evaluating multiple samples under a single condition fails to adequately model inter-sample relationships, thereby constraining alignment performance. To overcome this, the authors propose Multi-View GRPO (MV-GRPO), which introduces a novel condition augmentation mechanism that generates semantically similar yet diverse prompts to construct a dense multi-view reward landscape. This enables sparse single-view evaluation to be extended into multi-view advantage re-estimation without additional sampling cost. MV-GRPO incorporates a Condition Enhancer module to reshape the sample probability distribution and integrates it into the GRPO framework for multi-view optimization. Experimental results demonstrate that MV-GRPO significantly outperforms existing approaches, achieving new state-of-the-art performance in both preference alignment and image generation quality.

Technology Category

Application Category

📝 Abstract

Group Relative Policy Optimization (GRPO) has emerged as a powerful framework for preference alignment in text-to-image (T2I) flow models. However, we observe that the standard paradigm where evaluating a group of generated samples against a single condition suffers from insufficient exploration of inter-sample relationships, constraining both alignment efficacy and performance ceilings. To address this sparse single-view evaluation scheme, we propose Multi-View GRPO (MV-GRPO), a novel approach that enhances relationship exploration by augmenting the condition space to create a dense multi-view reward mapping. Specifically, for a group of samples generated from one prompt, MV-GRPO leverages a flexible Condition Enhancer to generate semantically adjacent yet diverse captions. These captions enable multi-view advantage re-estimation, capturing diverse semantic attributes and providing richer optimization signals. By deriving the probability distribution of the original samples conditioned on these new captions, we can incorporate them into the training process without costly sample regeneration. Extensive experiments demonstrate that MV-GRPO achieves superior alignment performance over state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

preference alignment

text-to-image generation

multi-view evaluation

condition space

sample relationships

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-View GRPO

Augmented Condition Space

Preference Alignment