Score Distillation of Flow Matching Models

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Flow Matching (FM) text-to-image models suffer from low sampling efficiency, particularly due to their reliance on multi-step iterative solvers. To address this, we propose Score identity Distillation (SiD), the first method to directly adapt score distillation to DiT-based FM models. Leveraging a Bayesian derivation, SiD establishes a unified perspective on score-field modeling between diffusion and flow matching, enabling distillation without teacher-model fine-tuning or architectural modifications. It supports both data-free and data-assisted distillation paradigms. Evaluated on state-of-the-art models—including SANA, SD3, and FLUX.1-dev—SiD achieves high-fidelity, text-aligned image generation in merely one or few steps after lightweight adaptation. This yields substantial inference speedup while preserving visual quality and semantic alignment. SiD thus provides a unified, efficient acceleration framework bridging diffusion and flow-matching generative paradigms.

Technology Category

Application Category

📝 Abstract
Diffusion models achieve high-quality image generation but are limited by slow iterative sampling. Distillation methods alleviate this by enabling one- or few-step generation. Flow matching, originally introduced as a distinct framework, has since been shown to be theoretically equivalent to diffusion under Gaussian assumptions, raising the question of whether distillation techniques such as score distillation transfer directly. We provide a simple derivation -- based on Bayes' rule and conditional expectations -- that unifies Gaussian diffusion and flow matching without relying on ODE/SDE formulations. Building on this view, we extend Score identity Distillation (SiD) to pretrained text-to-image flow-matching models, including SANA, SD3-Medium, SD3.5-Medium/Large, and FLUX.1-dev, all with DiT backbones. Experiments show that, with only modest flow-matching- and DiT-specific adjustments, SiD works out of the box across these models, in both data-free and data-aided settings, without requiring teacher finetuning or architectural changes. This provides the first systematic evidence that score distillation applies broadly to text-to-image flow matching models, resolving prior concerns about stability and soundness and unifying acceleration techniques across diffusion- and flow-based generators. We will make the PyTorch implementation publicly available.
Problem

Research questions and friction points this paper is trying to address.

Extending score distillation to accelerate flow matching models
Unifying diffusion and flow matching frameworks theoretically
Enabling efficient few-step text-to-image generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends score distillation to flow matching models
Unifies diffusion and flow matching via Bayes rule
Enables acceleration without teacher finetuning or architecture changes
🔎 Similar Papers
No similar papers found.