PromptMoG: Enhancing Diversity in Long-Prompt Image Generation via Prompt Embedding Mixture-of-Gaussian Sampling

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Long-text prompts enhance fidelity in text-to-image (T2I) generation but severely suppress diversity, yielding repetitive and less creative outputs. To address this, we propose PromptMoG—a training-free Gaussian Mixture (MoG) sampling method in the embedding space—that increases sampling entropy and generative diversity while preserving semantic consistency via moment matching and semantic regularization. To rigorously evaluate long-prompt generation, we introduce LPD-Bench, the first dedicated benchmark for this task. Extensive experiments across four state-of-the-art models—SD3.5-Large, Flux.1-Krea-Dev, CogView4, and Qwen-Image—demonstrate that PromptMoG significantly improves image diversity under long prompts without inducing semantic drift. Our approach establishes a new paradigm for controllable, diverse, and high-fidelity T2I generation.

Technology Category

Application Category

📝 Abstract

Recent advances in text-to-image (T2I) generation have achieved remarkable visual outcomes through large-scale rectified flow models. However, how these models behave under long prompts remains underexplored. Long prompts encode rich content, spatial, and stylistic information that enhances fidelity but often suppresses diversity, leading to repetitive and less creative outputs. In this work, we systematically study this fidelity-diversity dilemma and reveal that state-of-the-art models exhibit a clear drop in diversity as prompt length increases. To enable consistent evaluation, we introduce LPD-Bench, a benchmark designed for assessing both fidelity and diversity in long-prompt generation. Building on our analysis, we develop a theoretical framework that increases sampling entropy through prompt reformulation and propose a training-free method, PromptMoG, which samples prompt embeddings from a Mixture-of-Gaussians in the embedding space to enhance diversity while preserving semantics. Extensive experiments on four state-of-the-art models, SD3.5-Large, Flux.1-Krea-Dev, CogView4, and Qwen-Image, demonstrate that PromptMoG consistently improves long-prompt generation diversity without semantic drifting.

Problem

Research questions and friction points this paper is trying to address.

Addressing diversity loss in long-prompt image generation models

Solving fidelity-diversity tradeoff in text-to-image generation systems

Enhancing creative output while preserving semantic accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sampling embeddings from Mixture-of-Gaussians distribution

Training-free method enhancing diversity in generation

Preserving semantic fidelity while increasing output variety

🔎 Similar Papers

Minority-Focused Text-to-Image Generation via Prompt Optimization