ALTER: All-in-One Layer Pruning and Temporal Expert Routing for Efficient Diffusion Generation

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models suffer from high computational overhead during inference, and existing acceleration methods overlook the dynamic nature of timesteps while treating pruning and fine-tuning as disjoint stages—leading to suboptimal performance. This paper proposes ALTER, a unified framework that jointly optimizes layer-wise pruning and timestep-specific expert routing in a single-stage, end-to-end trainable manner. ALTER employs a trainable hypernetwork to dynamically generate timestep-adaptive pruning decisions and dispatches inputs to specialized sparse subnetworks. The UNet backbone and hypernetwork are co-fine-tuned under sparsity constraints to enable efficient sparse inference. Evaluated on Stable Diffusion v2.1, ALTER achieves comparable visual quality to the full 50-step baseline using only 20 timesteps, reduces MACs by 74.1% (i.e., retains only 25.9%), and delivers a 3.64× speedup at 35% sparsity—significantly improving the efficiency–quality trade-off.

Technology Category

Application Category

📝 Abstract
Diffusion models have demonstrated exceptional capabilities in generating high-fidelity images. However, their iterative denoising process results in significant computational overhead during inference, limiting their practical deployment in resource-constrained environments. Existing acceleration methods often adopt uniform strategies that fail to capture the temporal variations during diffusion generation, while the commonly adopted sequential pruning-then-fine-tuning strategy suffers from sub-optimality due to the misalignment between pruning decisions made on pretrained weights and the model's final parameters. To address these limitations, we introduce ALTER: All-in-One Layer Pruning and Temporal Expert Routing, a unified framework that transforms diffusion models into a mixture of efficient temporal experts. ALTER achieves a single-stage optimization that unifies layer pruning, expert routing, and model fine-tuning by employing a trainable hypernetwork, which dynamically generates layer pruning decisions and manages timestep routing to specialized, pruned expert sub-networks throughout the ongoing fine-tuning of the UNet. This unified co-optimization strategy enables significant efficiency gains while preserving high generative quality. Specifically, ALTER achieves same-level visual fidelity to the original 50-step Stable Diffusion v2.1 model while utilizing only 25.9% of its total MACs with just 20 inference steps and delivering a 3.64x speedup through 35% sparsity.
Problem

Research questions and friction points this paper is trying to address.

Reduce computational overhead in diffusion model inference
Address sub-optimality in sequential pruning strategies
Unify layer pruning and expert routing for efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified layer pruning and expert routing
Dynamic hypernetwork for single-stage optimization
Efficient temporal experts for diffusion models
🔎 Similar Papers
No similar papers found.
X
Xiaomeng Yang
Northeastern University
L
Lei Lu
Northeastern University
Qihui Fan
Qihui Fan
Graduate Student, Northeastern University
Changdi Yang
Changdi Yang
PhD candidate, Northeastern University, Snap Inc.
Efficient Deep Learning
J
Juyi Lin
Northeastern University
Y
Yanzhi Wang
Northeastern University
X
Xuan Zhang
Northeastern University
Shangqian Gao
Shangqian Gao
Florida State University
Computer VisionNatural Lanugage ProcessingMachine Learning