🤖 AI Summary
Diffusion models suffer from slow inference and high computational redundancy, while existing neural architecture search (NAS) methods are inefficient due to the need to train and evaluate thousands of candidate architectures. To address this, we propose a **training-free, segment-wise NAS paradigm**: the denoising process is partitioned into equal-length segments, each dynamically selecting among full-step, partial-step, or no-step structural configurations—jointly sparsifying both the number of generation steps and network block execution. We introduce zero-shot architecture evaluation and a hybrid step-type scheduling mechanism, ensuring compatibility with both Latent Diffusion Models (LDM) and Stable Diffusion. Our method achieves 2.6× and 5.1× inference acceleration on LDM-4-G and Stable Diffusion v1.5, respectively, outperforming state-of-the-art approaches. Extensive evaluation across multiple datasets confirms substantial reduction in denoising redundancy while preserving image fidelity.
📝 Abstract
Diffusion models are cutting-edge generative models adept at producing diverse, high-quality images. Despite their effectiveness, these models often require significant computational resources owing to their numerous sequential denoising steps and the significant inference cost of each step. Recently, Neural Architecture Search (NAS) techniques have been employed to automatically search for faster generation processes. However, NAS for diffusion is inherently time-consuming as it requires estimating thousands of diffusion models to search for the optimal one. In this paper, we introduce Flexiffusion, a novel training-free NAS paradigm designed to accelerate diffusion models by concurrently optimizing generation steps and network structures. Specifically, we partition the generation process into isometric step segments, each sequentially composed of a full step, multiple partial steps, and several null steps. The full step computes all network blocks, while the partial step involves part of the blocks, and the null step entails no computation. Flexiffusion autonomously explores flexible step combinations for each segment, substantially reducing search costs and enabling greater acceleration compared to the state-of-the-art (SOTA) method for diffusion models. Our searched models reported speedup factors of $2.6 imes$ and $1.5 imes$ for the original LDM-4-G and the SOTA, respectively. The factors for Stable Diffusion V1.5 and the SOTA are $5.1 imes$ and $2.0 imes$. We also verified the performance of Flexiffusion on multiple datasets, and positive experiment results indicate that Flexiffusion can effectively reduce redundancy in diffusion models.