🤖 AI Summary
Direct Preference Optimization (DPO) in automatic prompt engineering suffers from semantic inconsistency and deviation from user intent due to token-level regularization. Method: We propose Semantic-Aware Preference Optimization (SAPO), a novel framework for text-to-image prompt optimization. Its core innovation is the first introduction of an embedding-space cosine similarity–driven exponential weighting mechanism that dynamically rescales the DPO loss; we further derive a theoretical upper bound on semantic drift, guaranteeing that optimized prompts remain strictly within the semantic neighborhood of the original intent. SAPO requires no policy gradients or additional training and supports off-policy optimization. Results: Experiments across three benchmarks and two language models demonstrate consistent improvements: +8–12% in CLIP similarity, and +5–9% in human preference scores (HPSv2.1/PickScore), significantly outperforming existing DPO variants and other prompt optimization methods.
📝 Abstract
Generative AI can now synthesize strikingly realistic images from text, yet output quality remains highly sensitive to how prompts are phrased. Direct Preference Optimization (DPO) offers a lightweight, off-policy alternative to RL for automatic prompt engineering, but its token-level regularization leaves semantic inconsistency unchecked as prompts that win higher preference scores can still drift away from the user's intended meaning.
We introduce Sem-DPO, a variant of DPO that preserves semantic consistency yet retains its simplicity and efficiency. Sem-DPO scales the DPO loss by an exponential weight proportional to the cosine distance between the original prompt and winning candidate in embedding space, softly down-weighting training signals that would otherwise reward semantically mismatched prompts. We provide the first analytical bound on semantic drift for preference-tuned prompt generators, showing that Sem-DPO keeps learned prompts within a provably bounded neighborhood of the original text. On three standard text-to-image prompt-optimization benchmarks and two language models, Sem-DPO achieves 8-12% higher CLIP similarity and 5-9% higher human-preference scores (HPSv2.1, PickScore) than DPO, while also outperforming state-of-the-art baselines. These findings suggest that strong flat baselines augmented with semantic weighting should become the new standard for prompt-optimization studies and lay the groundwork for broader, semantics-aware preference optimization in language models.