Beyond Randomness: Understand the Order of the Noise in Diffusion

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies that the initial noise in text-to-image diffusion models (T2I) is not purely stochastic but inherently encodes interpretable semantic structure. To exploit this property, we propose a training-free, architecture-agnostic two-stage noise modulation framework: first, semantic components within the noise are identified via information-theoretic analysis; second, explicit control is achieved through semantic erasure and targeted re-injection. Grounded in a theoretical equivalence between the diffusion process and semantic injection, our approach requires neither fine-tuning nor model retraining. Extensive evaluation across diverse backbone architectures—including DiT and U-Net—demonstrates substantial improvements in inter-step consistency and text–image alignment fidelity. The method establishes a novel paradigm for controllable generation in diffusion models, enabling precise, semantics-aware noise manipulation without architectural or training modifications.

Technology Category

Application Category

📝 Abstract
In text-driven content generation (T2C) diffusion model, semantic of generated content is mostly attributed to the process of text embedding and attention mechanism interaction. The initial noise of the generation process is typically characterized as a random element that contributes to the diversity of the generated content. Contrary to this view, this paper reveals that beneath the random surface of noise lies strong analyzable patterns. Specifically, this paper first conducts a comprehensive analysis of the impact of random noise on the model's generation. We found that noise not only contains rich semantic information, but also allows for the erasure of unwanted semantics from it in an extremely simple way based on information theory, and using the equivalence between the generation process of diffusion model and semantic injection to inject semantics into the cleaned noise. Then, we mathematically decipher these observations and propose a simple but efficient training-free and universal two-step"Semantic Erasure-Injection"process to modulate the initial noise in T2C diffusion model. Experimental results demonstrate that our method is consistently effective across various T2C models based on both DiT and UNet architectures and presents a novel perspective for optimizing the generation of diffusion model, providing a universal tool for consistent generation.
Problem

Research questions and friction points this paper is trying to address.

Analyzing hidden semantic patterns in diffusion model noise
Developing noise modulation method to erase unwanted semantics
Injecting controlled semantics into cleaned noise for consistent generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes semantic patterns in diffusion model noise
Uses information theory to erase unwanted noise semantics
Injects new semantics into cleaned noise for generation
🔎 Similar Papers
No similar papers found.
Song Yan
Song Yan
Senior Engineer at Honor Device Co., Ltd
Computer VisionObject Tracking & Detection & Segmentation
M
Min Li
Xi’an High-tech Research Institute
X
Xinliang Bi
Xi’an High-tech Research Institute
J
Jian Yang
USTC
Yusen Zhang
Yusen Zhang
PhD Student at Penn State University
Natural Language ProcessingMachine Learning
G
Guanye Xiong
Xi’an High-tech Research Institute
Y
Yunwei Lan
Xi’an High-tech Research Institute
T
Tao Zhang
HUST
W
Wei Zhai
USTC
Z
Zheng-Jun Zha
USTC