Dynamic VLM-Guided Negative Prompting for Diffusion Models

πŸ“… 2025-10-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Traditional diffusion models rely on static negative prompts, which fail to adapt to the dynamic semantic evolution of images during the denoising process. To address this limitation, we propose a vision-language model (VLM)-based dynamic negative prompting method: at critical denoising steps, intermediate denoised images are fed into a VLM to generate context-aware, image-conditioned negative prompts; additionally, a learnable negative guidance strength controller enables fine-grained semantic constraint. This approach breaks the fixed-prompt paradigm and is the first to embed a VLM directly into the diffusion sampling loop for real-time, image-driven negative prompt generation. Extensive experiments across multiple benchmark datasets demonstrate significant improvements in text–image alignment quality, achieving a superior trade-off between fidelity and semantic consistency. Our method establishes a novel paradigm for controllable image generation, advancing the state of the art in conditional diffusion modeling.

Technology Category

Application Category

πŸ“ Abstract
We propose a novel approach for dynamic negative prompting in diffusion models that leverages Vision-Language Models (VLMs) to adaptively generate negative prompts during the denoising process. Unlike traditional Negative Prompting methods that use fixed negative prompts, our method generates intermediate image predictions at specific denoising steps and queries a VLM to produce contextually appropriate negative prompts. We evaluate our approach on various benchmark datasets and demonstrate the trade-offs between negative guidance strength and text-image alignment.
Problem

Research questions and friction points this paper is trying to address.

Dynamic negative prompting adapts to denoising steps
Generates context-aware negative prompts using VLMs
Balances guidance strength and text-image alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic negative prompting using Vision-Language Models
Generates intermediate images for contextual negative prompts
Adaptively produces prompts during denoising process
πŸ”Ž Similar Papers
No similar papers found.