Boosting Text-to-Image Diffusion Models via Core Token Attention-Based Seed Selection

📅 2026-05-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

213K/year
🤖 AI Summary
This work addresses the high sensitivity of text-to-image diffusion models to random seeds, which leads to inconsistent output quality and misalignment with input prompts. The study reveals, for the first time, a strong correlation between the cross-attention dynamics of core prompt tokens during early denoising stages and the final generation quality. Building on this insight, the authors propose a training-free, inference-time seed scoring and selection mechanism that does not alter the initial noise. By analyzing attention shifts of key prompt tokens, the method ranks candidate seeds and retains only the top-k highest-scoring ones for full generation. Evaluated across three benchmarks, this plug-and-play strategy consistently enhances both text-image alignment and visual fidelity of Stable Diffusion variants, with improvements corroborated by human preference studies and automatic metrics.
📝 Abstract
Text-to-image diffusion models can synthesize high-quality images, yet the outcome is notoriously sensitive to the random seed: different initial seeds often yield large variations in image quality and prompt-image alignment. We revisit this "seed effect" and show that attention dynamics over prompt core tokens, the content-bearing words, measured during the first few denoising steps, strongly predict final generation quality. Building on this observation, we introduce Attention-Based Seed Selection (ABSS), a training-free, plug-and-play method that ranks seeds for a given prompt by leveraging cross-attention to core tokens during the denoising process. ABSS requires no finetuning and does not alter the initial noise; it scores and ranks all candidate seeds, keeps only the top-k for full generation, and discards the rest, without relying on a fixed accept/reject threshold. Operating purely at inference time, ABSS can serve as a lightweight pre-selection add-on for existing seed-optimization pipelines, enabling additional gains. Across three benchmarks, extensive experiments show that ABSS enables consistent improvements in text-image alignment and visual quality for Stable Diffusion variants, as corroborated by human preference and alignment metrics.
Problem

Research questions and friction points this paper is trying to address.

text-to-image diffusion
random seed sensitivity
image quality
prompt-image alignment
seed effect
Innovation

Methods, ideas, or system contributions that make the work stand out.

seed selection
core token attention
text-to-image diffusion
training-free optimization
cross-attention dynamics
🔎 Similar Papers