🤖 AI Summary
Diffusion models suffer from high computational overhead during inference, and existing acceleration methods overlook the dynamic nature of timesteps while treating pruning and fine-tuning as disjoint stages—leading to suboptimal performance. This paper proposes ALTER, a unified framework that jointly optimizes layer-wise pruning and timestep-specific expert routing in a single-stage, end-to-end trainable manner. ALTER employs a trainable hypernetwork to dynamically generate timestep-adaptive pruning decisions and dispatches inputs to specialized sparse subnetworks. The UNet backbone and hypernetwork are co-fine-tuned under sparsity constraints to enable efficient sparse inference. Evaluated on Stable Diffusion v2.1, ALTER achieves comparable visual quality to the full 50-step baseline using only 20 timesteps, reduces MACs by 74.1% (i.e., retains only 25.9%), and delivers a 3.64× speedup at 35% sparsity—significantly improving the efficiency–quality trade-off.
📝 Abstract
Diffusion models have demonstrated exceptional capabilities in generating high-fidelity images. However, their iterative denoising process results in significant computational overhead during inference, limiting their practical deployment in resource-constrained environments. Existing acceleration methods often adopt uniform strategies that fail to capture the temporal variations during diffusion generation, while the commonly adopted sequential pruning-then-fine-tuning strategy suffers from sub-optimality due to the misalignment between pruning decisions made on pretrained weights and the model's final parameters. To address these limitations, we introduce ALTER: All-in-One Layer Pruning and Temporal Expert Routing, a unified framework that transforms diffusion models into a mixture of efficient temporal experts. ALTER achieves a single-stage optimization that unifies layer pruning, expert routing, and model fine-tuning by employing a trainable hypernetwork, which dynamically generates layer pruning decisions and manages timestep routing to specialized, pruned expert sub-networks throughout the ongoing fine-tuning of the UNet. This unified co-optimization strategy enables significant efficiency gains while preserving high generative quality. Specifically, ALTER achieves same-level visual fidelity to the original 50-step Stable Diffusion v2.1 model while utilizing only 25.9% of its total MACs with just 20 inference steps and delivering a 3.64x speedup through 35% sparsity.