SlimDiff: Training-Free, Activation-Guided Hands-free Slimming of Diffusion Models

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models suffer from high inference costs due to their large parameter counts and iterative denoising process; existing compression methods rely on fine-tuning or retraining, hindering efficient deployment. This paper proposes SlimDiff—the first gradient-free, fine-tuning–free structured compression framework for diffusion models. Leveraging only 500 calibration samples, SlimDiff employs activation covariance spectral analysis to guide module-level low-rank approximation and dynamic pruning. It introduces functional-group matrix decomposition in attention layers (with coupled Q-K and V-O matrices) and feed-forward subnetworks, and adaptively allocates sparsity to suppress error accumulation. Experiments demonstrate that SlimDiff achieves up to 35% inference speedup and reduces parameters by approximately 100 million, while preserving generation quality—marking a significant breakthrough in training-free diffusion model compression.

Technology Category

Application Category

📝 Abstract
Diffusion models (DMs), lauded for their generative performance, are computationally prohibitive due to their billion-scale parameters and iterative denoising dynamics. Existing efficiency techniques, such as quantization, timestep reduction, or pruning, offer savings in compute, memory, or runtime but are strictly bottlenecked by reliance on fine-tuning or retraining to recover performance. In this work, we introduce SlimDiff, an automated activation-informed structural compression framework that reduces both attention and feedforward dimensionalities in DMs, while being entirely gradient-free. SlimDiff reframes DM compression as a spectral approximation task, where activation covariances across denoising timesteps define low-rank subspaces that guide dynamic pruning under a fixed compression budget. This activation-aware formulation mitigates error accumulation across timesteps by applying module-wise decompositions over functional weight groups: query--key interactions, value--output couplings, and feedforward projections, rather than isolated matrix factorizations, while adaptively allocating sparsity across modules to respect the non-uniform geometry of diffusion trajectories. SlimDiff achieves up to 35% acceleration and $sim$100M parameter reduction over baselines, with generation quality on par with uncompressed models without any backpropagation. Crucially, our approach requires only about 500 calibration samples, over 70$ imes$ fewer than prior methods. To our knowledge, this is the first closed-form, activation-guided structural compression of DMs that is entirely training-free, providing both theoretical clarity and practical efficiency.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational costs of billion-parameter diffusion models without fine-tuning
Achieving structural compression through activation-informed spectral approximation
Eliminating performance recovery dependency on retraining or backpropagation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free activation-guided structural compression of diffusion models
Spectral approximation using activation covariances for dynamic pruning
Module-wise decomposition over functional weight groups without fine-tuning
🔎 Similar Papers
No similar papers found.