Guidance Free Image Editing via Explicit Conditioning

📅 2025-03-22

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Current conditional diffusion models rely on classifier-free guidance (CFG), requiring multiple sampling steps—e.g., three forward passes—for tasks like image editing—thus incurring substantial inference overhead. To address this, we propose Explicit Conditioning (EC), the first method to directly model the noise distribution as a function of multimodal conditional inputs. EC introduces a modality-aware noise prediction network and an end-to-end differentiable conditioning injection mechanism, enabling high-fidelity guided generation in a single denoising step. By eliminating dependence on iterative CFG sampling, EC achieves comparable or superior generation quality and diversity while accelerating inference by 2.3×. This work establishes a new paradigm for efficient, controllable diffusion-based image editing.

Technology Category

Application Category

📝 Abstract

Current sampling mechanisms for conditional diffusion models rely mainly on Classifier Free Guidance (CFG) to generate high-quality images. However, CFG requires several denoising passes in each time step, e.g., up to three passes in image editing tasks, resulting in excessive computational costs. This paper introduces a novel conditioning technique to ease the computational burden of the well-established guidance techniques, thereby significantly improving the inference time of diffusion models. We present Explicit Conditioning (EC) of the noise distribution on the input modalities to achieve this. Intuitively, we model the noise to guide the conditional diffusion model during the diffusion process. We present evaluations on image editing tasks and demonstrate that EC outperforms CFG in generating diverse high-quality images with significantly reduced computations.

Problem

Research questions and friction points this paper is trying to address.

Reduces computational costs in diffusion models

Improves inference time for image editing

Enhances quality and diversity of generated images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Explicit Conditioning replaces Classifier Free Guidance

Models noise to guide diffusion process directly

Reduces computations while maintaining image quality

🔎 Similar Papers

TextureDiffusion: Target Prompt Disentangled Editing for Various Texture Transfer