Evolutionary Token-Level Prompt Optimization for Diffusion Models

πŸ“… 2026-04-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

200K/year
πŸ€– AI Summary
This work addresses the high sensitivity of text-to-image diffusion models to input prompts, which typically necessitates extensive manual tuning. The authors propose a model-agnostic, automated prompt optimization method that directly evolves prompt embeddings in CLIP latent space via a genetic algorithm, eliminating the need for textual rewriting. The fitness function integrates aesthetic quality, measured using LAION-Aesthetics V2, and image-text alignment, quantified by CLIPScore. Evaluated on 36 prompts from the P2 dataset, the approach significantly outperforms both Promptist and random search, achieving up to a 23.93% improvement in fitness scores. The framework is modular and scalable, offering a general-purpose solution for prompt refinement in text-to-image generation.

Technology Category

Application Category

πŸ“ Abstract
Text-to-image diffusion models exhibit strong generative performance but remain highly sensitive to prompt formulation, often requiring extensive manual trial and error to obtain satisfactory results. This motivates the development of automated, model-agnostic prompt optimization methods that can systematically explore the conditioning space beyond conventional text rewriting. This work investigates the use of a Genetic Algorithm (GA) for prompt optimization by directly evolving the token vectors employed by CLIP-based diffusion models. The GA optimizes a fitness function that combines aesthetic quality, measured by the LAION Aesthetic Predictor V2, with prompt-image alignment, assessed via CLIPScore. Experiments on 36 prompts from the Parti Prompts (P2) dataset show that the proposed approach outperforms the baseline methods, including Promptist and random search, achieving up to a 23.93% improvement in fitness. Overall, the method is adaptable to image generation models with tokenized text encoders and provides a modular framework for future extensions, the limitations and prospects of which are discussed.
Problem

Research questions and friction points this paper is trying to address.

prompt optimization
diffusion models
text-to-image generation
token-level optimization
model-agnostic
Innovation

Methods, ideas, or system contributions that make the work stand out.

Genetic Algorithm
Prompt Optimization
Diffusion Models
Token-level Evolution
CLIP-based Conditioning