When Policy Entropy Constraint Fails: Preserving Diversity in Flow-based RLHF via Perceptual Entropy

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the severe collapse in perceptual diversity commonly observed in flow-matching-based text-to-image models after reinforcement learning from human feedback (RLHF) fine-tuning. Conventional policy entropy fails to effectively regulate diversity due to its invariance under fixed noise schedules. To overcome this limitation, the study introduces the novel concept of “perceptual entropy,” which quantifies and optimizes diversity in perceptual space, and establishes its theoretical connection to standard entropy. Two perceptual entropy regularization strategies are proposed accordingly. The method significantly improves the trade-off between image quality and diversity, achieving strong performance across multiple base models and reward configurations: the PEC approach attains an overall score of 0.734 (versus 0.366 for the baseline), and under complementary settings, average diversity increases to 0.989 (compared to 0.047 for the baseline).

📝 Abstract

RLHF is widely used to align flow-matching text-to-image models with human preferences, but often leads to severe diversity collapse after fine-tuning. In RL, diversity is often assumed to correlate with policy entropy, motivating entropy regularization. However, we show this intuition breaks in flow models: policy entropy remains constant, even while perceptual diversity collapses. We explain this mismatch both theoretically and empirically: the constant entropy arises from the fixed, pre-defined noise schedule, while the diversity collapse is driven by the mode-seeking nature of policy gradients. As a result, policy entropy fails to prevent the model from converging to a narrow high-reward region in the perceptual space. To this end, we introduce perceptual entropy that captures diversity in a perceptual space and maintains the property of standard entropy. Building upon this insight, we propose two entropy-regularized strategies, Perceptual Entropy Constraint and Perceptual Constraints on Generation Space, to preserve perceptual diversity and improve the quality. Experiments across two base models, neural and rule-based rewards, and three perceptual spaces demonstrate consistent gains in the quality-diversity trade-off; PEC achieves the best overall score of 0.734 (vs. baseline's 0.366); a complementary setting of PEC further reaches a diversity average of 0.989 (vs. baseline's 0.047). Our project page (https://xiaofeng-tan.github.io/projects/PEC) is publicly available.

Problem

Research questions and friction points this paper is trying to address.

diversity collapse

policy entropy

perceptual diversity

flow-based RLHF

entropy regularization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Perceptual Entropy

Flow-based RLHF

Diversity Preservation