Flexiffusion: Training-Free Segment-Wise Neural Architecture Search for Efficient Diffusion Models

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Diffusion models suffer from slow inference and high energy consumption, while existing neural architecture search (NAS) methods are hindered by costly retraining, exponential search complexity, and inefficient generative evaluation. This paper proposes a retraining-free, segment-wise NAS framework: it partitions the diffusion generation process into equal-length segments and dynamically schedules each step as “full computation,” “feature reuse,” or “skip,” jointly optimizing the scheduling policy and architecture under frozen pre-trained weights. We introduce the first segment-wise search space, reducing search complexity from exponential to polynomial. Moreover, we propose relative FID (rFID), a zero-shot, teacher-guided metric enabling ultra-fast evaluation—reducing evaluation time by over 90%. Our method achieves ≥2× speedup with <5% FID degradation on LDM, Stable Diffusion, and DDPM; notably, it delivers 5.1× acceleration on Stable Diffusion with near-lossless CLIP Score.

Technology Category

Application Category

📝 Abstract

Diffusion models (DMs) are powerful generative models capable of producing high-fidelity images but are constrained by high computational costs due to iterative multi-step inference. While Neural Architecture Search (NAS) can optimize DMs, existing methods are hindered by retraining requirements, exponential search complexity from step-wise optimization, and slow evaluation relying on massive image generation. To address these challenges, we propose Flexiffusion, a training-free NAS framework that jointly optimizes generation schedules and model architectures without modifying pre-trained parameters. Our key insight is to decompose the generation process into flexible segments of equal length, where each segment dynamically combines three step types: full (complete computation), partial (cache-reused computation), and null (skipped computation). This segment-wise search space reduces the candidate pool exponentially compared to step-wise NAS while preserving architectural diversity. Further, we introduce relative FID (rFID), a lightweight evaluation metric for NAS that measures divergence from a teacher model's outputs instead of ground truth, slashing evaluation time by over $90%$. In practice, Flexiffusion achieves at least $2 imes$ acceleration across LDMs, Stable Diffusion, and DDPMs on ImageNet and MS-COCO, with FID degradation under $5%$, outperforming prior NAS and caching methods. Notably, it attains $5.1 imes$ speedup on Stable Diffusion with near-identical CLIP scores. Our work pioneers a resource-efficient paradigm for searching high-speed DMs without sacrificing quality.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational costs in diffusion models without retraining

Optimizing generation schedules and model architectures jointly

Accelerating diffusion models while maintaining output quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free NAS for diffusion models

Segment-wise search with dynamic computation

Lightweight rFID metric for fast evaluation

🔎 Similar Papers

Flexiffusion: Segment-wise Neural Architecture Search for Flexible Denoising Schedule