Image Generation from Contextually-Contradictory Prompts

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-to-image diffusion models often produce distorted outputs when processing semantically implicit conflicting prompts (e.g., “feathered car”), as they struggle to reconcile compositional concepts with pretrained semantic priors. To address this, we propose the Stage-Aware Prompt Decomposition Framework (SPDF), which dynamically injects semantically consistent proxy prompts at each denoising step. SPDF first leverages an LLM to detect conflicts and rewrite prompts while preserving user intent; it then models stage-specific characteristics of the diffusion latent space and enforces contextual consistency to generate refined proxy prompts. Crucially, SPDF achieves precise alignment between the denoising process and semantic evolution—unprecedented in prior work. Evaluated on multiple contradictory prompt benchmarks, SPDF improves CLIP-Score by 12.3% and human evaluation accuracy by 21.7%, yielding images that strictly adhere to the original semantic intent while maintaining visual plausibility.

Technology Category

Application Category

📝 Abstract
Text-to-image diffusion models excel at generating high-quality, diverse images from natural language prompts. However, they often fail to produce semantically accurate results when the prompt contains concept combinations that contradict their learned priors. We define this failure mode as contextual contradiction, where one concept implicitly negates another due to entangled associations learned during training. To address this, we propose a stage-aware prompt decomposition framework that guides the denoising process using a sequence of proxy prompts. Each proxy prompt is constructed to match the semantic content expected to emerge at a specific stage of denoising, while ensuring contextual coherence. To construct these proxy prompts, we leverage a large language model (LLM) to analyze the target prompt, identify contradictions, and generate alternative expressions that preserve the original intent while resolving contextual conflicts. By aligning prompt information with the denoising progression, our method enables fine-grained semantic control and accurate image generation in the presence of contextual contradictions. Experiments across a variety of challenging prompts show substantial improvements in alignment to the textual prompt.
Problem

Research questions and friction points this paper is trying to address.

Generating images from prompts with conflicting concepts
Resolving contextual contradictions in text-to-image models
Improving semantic accuracy in diffusion-based image generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stage-aware prompt decomposition for denoising
LLM-generated proxy prompts resolve contradictions
Aligns prompt semantics with denoising progression
🔎 Similar Papers
No similar papers found.