Learning Few-Step Diffusion Models by Trajectory Distribution Matching

📅 2025-03-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Diffusion models face a fundamental trade-off between generation quality and sampling efficiency in few-step inference: existing distribution-matching methods lack multi-step flexibility, while trajectory-matching approaches degrade image fidelity. To address this, we propose Trajectory Distribution Matching (TDM), a unified distillation framework. Its key contributions are: (1) a data-agnostic score distillation objective ensuring distribution-level consistency between teacher and student trajectories; and (2) a sampling-step-aware objective that decouples per-step learning, enabling deterministic sampling and flexible step adaptation. Experiments demonstrate that TDM distills PixArt-α into a 4-step generator achieving superior user preference scores over the teacher at 1024×1024 resolution—requiring only 500 iterations and 2 A800 GPU-hours (0.01% of teacher training cost). Extended to video generation, TDM attains 81.65 on VBench with just 4 NFE, surpassing CogVideoX-2B (80.91).

Technology Category

Application Category

📝 Abstract

Accelerating diffusion model sampling is crucial for efficient AIGC deployment. While diffusion distillation methods -- based on distribution matching and trajectory matching -- reduce sampling to as few as one step, they fall short on complex tasks like text-to-image generation. Few-step generation offers a better balance between speed and quality, but existing approaches face a persistent trade-off: distribution matching lacks flexibility for multi-step sampling, while trajectory matching often yields suboptimal image quality. To bridge this gap, we propose learning few-step diffusion models by Trajectory Distribution Matching (TDM), a unified distillation paradigm that combines the strengths of distribution and trajectory matching. Our method introduces a data-free score distillation objective, aligning the student's trajectory with the teacher's at the distribution level. Further, we develop a sampling-steps-aware objective that decouples learning targets across different steps, enabling more adjustable sampling. This approach supports both deterministic sampling for superior image quality and flexible multi-step adaptation, achieving state-of-the-art performance with remarkable efficiency. Our model, TDM, outperforms existing methods on various backbones, such as SDXL and PixArt-$alpha$, delivering superior quality and significantly reduced training costs. In particular, our method distills PixArt-$alpha$ into a 4-step generator that outperforms its teacher on real user preference at 1024 resolution. This is accomplished with 500 iterations and 2 A800 hours -- a mere 0.01% of the teacher's training cost. In addition, our proposed TDM can be extended to accelerate text-to-video diffusion. Notably, TDM can outperform its teacher model (CogVideoX-2B) by using only 4 NFE on VBench, improving the total score from 80.91 to 81.65. Project page: https://tdm-t2x.github.io/

Problem

Research questions and friction points this paper is trying to address.

Accelerating diffusion model sampling for efficient AIGC deployment.

Balancing speed and quality in few-step text-to-image generation.

Reducing training costs while maintaining superior image quality.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Trajectory Distribution Matching for diffusion models

Data-free score distillation for trajectory alignment

Sampling-steps-aware objective for adjustable sampling

🔎 Similar Papers

No similar papers found.

Authors to Follow