EMAG: Self-Rectifying Diffusion Sampling with Exponential Moving Average Guidance

📅 2025-12-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing guided sampling methods for diffusion and flow-matching models suffer from inflexible target-layer selection and limited control over the granularity and difficulty of negative samples, hindering generation quality and consistency. This paper proposes a training-free, inference-time attention modulation mechanism: it dynamically identifies critical transformer layers via exponential moving average (EMA)–based stability statistics of layer-wise features; leverages these layers to synthesize semantically faithful, fine-grained negative samples with controllable difficulty; and incorporates a self-correction module for precise degradation and restoration. To our knowledge, this is the first work to introduce a statistics-driven, dynamic layer-selection paradigm coupled with fine-grained negative-sample generation. The method is plug-and-play on diffusion Transformers and fully compatible with mainstream guidance techniques—including Classifier-Free Guidance (CFG), Adaptive Perturbation Guidance (APG), and Consistency-Aware Diffusion Sampling (CADS). Experiments demonstrate a +0.46 improvement in Human Preference Score (HPS) at zero training cost, significantly enhancing both generation fidelity and human preference.

Technology Category

Application Category

📝 Abstract
In diffusion and flow-matching generative models, guidance techniques are widely used to improve sample quality and consistency. Classifier-free guidance (CFG) is the de facto choice in modern systems and achieves this by contrasting conditional and unconditional samples. Recent work explores contrasting negative samples at inference using a weaker model, via strong/weak model pairs, attention-based masking, stochastic block dropping, or perturbations to the self-attention energy landscape. While these strategies refine the generation quality, they still lack reliable control over the granularity or difficulty of the negative samples, and target-layer selection is often fixed. We propose Exponential Moving Average Guidance (EMAG), a training-free mechanism that modifies attention at inference time in diffusion transformers, with a statistics-based, adaptive layer-selection rule. Unlike prior methods, EMAG produces harder, semantically faithful negatives (fine-grained degradations), surfacing difficult failure modes, enabling the denoiser to refine subtle artifacts, boosting the quality and human preference score (HPS) by +0.46 over CFG. We further demonstrate that EMAG naturally composes with advanced guidance techniques, such as APG and CADS, further improving HPS.
Problem

Research questions and friction points this paper is trying to address.

Improves negative sample control in diffusion models
Enhances generation quality via adaptive layer selection
Boosts human preference scores over existing guidance methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free attention modification at inference
Adaptive layer selection based on statistics
Generates fine-grained semantically faithful negatives
🔎 Similar Papers
No similar papers found.