Beyond Randomness: Understand the Order of the Noise in Diffusion

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work identifies that the initial noise in text-to-image diffusion models (T2I) is not purely stochastic but inherently encodes interpretable semantic structure. To exploit this property, we propose a training-free, architecture-agnostic two-stage noise modulation framework: first, semantic components within the noise are identified via information-theoretic analysis; second, explicit control is achieved through semantic erasure and targeted re-injection. Grounded in a theoretical equivalence between the diffusion process and semantic injection, our approach requires neither fine-tuning nor model retraining. Extensive evaluation across diverse backbone architectures—including DiT and U-Net—demonstrates substantial improvements in inter-step consistency and text–image alignment fidelity. The method establishes a novel paradigm for controllable generation in diffusion models, enabling precise, semantics-aware noise manipulation without architectural or training modifications.

Technology Category

Application Category

📝 Abstract

In text-driven content generation (T2C) diffusion model, semantic of generated content is mostly attributed to the process of text embedding and attention mechanism interaction. The initial noise of the generation process is typically characterized as a random element that contributes to the diversity of the generated content. Contrary to this view, this paper reveals that beneath the random surface of noise lies strong analyzable patterns. Specifically, this paper first conducts a comprehensive analysis of the impact of random noise on the model's generation. We found that noise not only contains rich semantic information, but also allows for the erasure of unwanted semantics from it in an extremely simple way based on information theory, and using the equivalence between the generation process of diffusion model and semantic injection to inject semantics into the cleaned noise. Then, we mathematically decipher these observations and propose a simple but efficient training-free and universal two-step"Semantic Erasure-Injection"process to modulate the initial noise in T2C diffusion model. Experimental results demonstrate that our method is consistently effective across various T2C models based on both DiT and UNet architectures and presents a novel perspective for optimizing the generation of diffusion model, providing a universal tool for consistent generation.

Problem

Research questions and friction points this paper is trying to address.

Analyzing hidden semantic patterns in diffusion model noise

Developing noise modulation method to erase unwanted semantics

Injecting controlled semantics into cleaned noise for consistent generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes semantic patterns in diffusion model noise

Uses information theory to erase unwanted noise semantics

Injects new semantics into cleaned noise for generation

🔎 Similar Papers

No similar papers found.