🤖 AI Summary
Existing diversity evaluation in text-to-image (T2I) generation suffers from structural flaws—namely, “over-diversification” (e.g., unauthorized alteration of demographic attributes specified in prompts) and “under-diversification” (e.g., insufficient representation)—due to neglect of contextual constraints. Method: We propose DIVBENCH, the first benchmark framework that systematically distinguishes, quantifies, and jointly evaluates both imbalances via context-aware semantic fidelity constraints, moving beyond conventional unidirectional diversity maximization. It integrates LLM-guided FairDiffusion and context-aware prompt rewriting for systematic assessment across mainstream T2I models. Contribution/Results: Experiments reveal widespread under-diversification in current models; existing augmentation methods often induce over-correction; in contrast, DIVBENCH-driven context-aware strategies significantly improve fairness and representational balance while strictly preserving semantic accuracy—achieving an optimal trade-off between diversity and fidelity.
📝 Abstract
Current diversification strategies for text-to-image (T2I) models often ignore contextual appropriateness, leading to over-diversification where demographic attributes are modified even when explicitly specified in prompts. This paper introduces DIVBENCH, a benchmark and evaluation framework for measuring both under- and over-diversification in T2I generation. Through systematic evaluation of state-of-the-art T2I models, we find that while most models exhibit limited diversity, many diversification approaches overcorrect by inappropriately altering contextually-specified attributes. We demonstrate that context-aware methods, particularly LLM-guided FairDiffusion and prompt rewriting, can already effectively address under-diversity while avoiding over-diversification, achieving a better balance between representation and semantic fidelity.