Massive Activations are the Key to Local Detail Synthesis in Diffusion Transformers

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work identifies the critical role of “Massive Activations” (MAs)—a prevalent phenomenon in diffusion Transformers (DiTs)—in local detail generation. While MAs are ubiquitous, their potential for fine-grained structural modeling remains underexploited. To address this, we propose Detail Guidance (DG), a training-free, model-self-guided strategy that selectively disrupts and reweights MAs via timestep-aware feature map activation intervention, thereby enhancing detail fidelity. DG is plug-and-play, fully compatible with classifier-free guidance (CFG) and agnostic to architectural specifics—supporting state-of-the-art DiT-based models including SD3, SD3.5, and Flux. Extensive experiments demonstrate that DG significantly improves texture sharpness and structural consistency in generated images, without requiring additional training, parameter updates, or architectural modifications.

Technology Category

Application Category

📝 Abstract

Diffusion Transformers (DiTs) have recently emerged as a powerful backbone for visual generation. Recent observations reveal emph{Massive Activations} (MAs) in their internal feature maps, yet their function remains poorly understood. In this work, we systematically investigate these activations to elucidate their role in visual generation. We found that these massive activations occur across all spatial tokens, and their distribution is modulated by the input timestep embeddings. Importantly, our investigations further demonstrate that these massive activations play a key role in local detail synthesis, while having minimal impact on the overall semantic content of output. Building on these insights, we propose extbf{D}etail extbf{G}uidance ( extbf{DG}), a MAs-driven, training-free self-guidance strategy to explicitly enhance local detail fidelity for DiTs. Specifically, DG constructs a degraded ``detail-deficient'' model by disrupting MAs and leverages it to guide the original network toward higher-quality detail synthesis. Our DG can seamlessly integrate with Classifier-Free Guidance (CFG), enabling further refinements of fine-grained details. Extensive experiments demonstrate that our DG consistently improves fine-grained detail quality across various pre-trained DiTs (eg, SD3, SD3.5, and Flux).

Problem

Research questions and friction points this paper is trying to address.

Investigating massive activations' role in local detail synthesis for diffusion transformers

Developing training-free guidance to enhance fine-grained detail fidelity

Improving visual generation quality without altering semantic content

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free self-guidance strategy using massive activations

Disrupting activations to construct detail-deficient model

Seamless integration with classifier-free guidance for refinement

🔎 Similar Papers

No similar papers found.