Early-Bird Diffusion: Investigating and Leveraging Timestep-Aware Early-Bird Tickets in Diffusion Models for Efficient Training

📅 2025-04-13

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Diffusion models (DMs) suffer from prohibitive computational overhead due to multi-step forward and backward passes during training. To address this, we propose EB-Diff-Train—a novel time-step-aware dynamic sparse training framework. We first identify timestep-aware early-bird tickets in DMs: sparse subnetworks that evolve dynamically across diffusion timesteps and remain highly trainable. Leveraging this insight, EB-Diff-Train adaptively prunes parameters—retaining high-density computation at critical timesteps while aggressively sparsifying non-critical regions. It further enables parallel training of multiple sparse subnetworks and ensemble inference. Extensive experiments demonstrate that EB-Diff-Train achieves lossless generation quality while accelerating training by 2.9×–5.8× over dense baselines and by 10.3× over standard pruning-and-finetuning. This significantly reduces training cost without compromising fidelity or diversity.

Technology Category

Application Category

📝 Abstract

Training diffusion models (DMs) requires substantial computational resources due to multiple forward and backward passes across numerous timesteps, motivating research into efficient training techniques. In this paper, we propose EB-Diff-Train, a new efficient DM training approach that is orthogonal to other methods of accelerating DM training, by investigating and leveraging Early-Bird (EB) tickets -- sparse subnetworks that manifest early in the training process and maintain high generation quality. We first investigate the existence of traditional EB tickets in DMs, enabling competitive generation quality without fully training a dense model. Then, we delve into the concept of diffusion-dedicated EB tickets, drawing on insights from varying importance of different timestep regions. These tickets adapt their sparsity levels according to the importance of corresponding timestep regions, allowing for aggressive sparsity during non-critical regions while conserving computational resources for crucial timestep regions. Building on this, we develop an efficient DM training technique that derives timestep-aware EB tickets, trains them in parallel, and combines them during inference for image generation. Extensive experiments validate the existence of both traditional and timestep-aware EB tickets, as well as the effectiveness of our proposed EB-Diff-Train method. This approach can significantly reduce training time both spatially and temporally -- achieving 2.9$ imes$ to 5.8$ imes$ speedups over training unpruned dense models, and up to 10.3$ imes$ faster training compared to standard train-prune-finetune pipelines -- without compromising generative quality. Our code is available at https://github.com/GATECH-EIC/Early-Bird-Diffusion.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational resources for diffusion model training

Identifying sparse subnetworks for efficient training

Optimizing timestep regions to conserve computational resources

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages Early-Bird tickets for efficient training

Adapts sparsity levels by timestep importance

Trains timestep-aware tickets in parallel

🔎 Similar Papers

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training