Path-Guided Flow Matching for Dataset Distillation

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Existing diffusion-based dataset distillation methods rely on heuristic guidance or prototype assignment, leading to low sampling efficiency, unstable trajectories, and poor generalization. This work proposes Path-Guided Flow Matching (PGFM), a novel framework that introduces flow matching into generative dataset distillation for the first time. Operating in the latent space of a frozen VAE, PGFM enables rapid deterministic synthesis with only a few ODE steps and incorporates a continuous path-to-prototype guidance mechanism that balances diversity, efficiency, and stability. Experimental results demonstrate that PGFM achieves competitive performance on high-resolution benchmarks, surpassing existing diffusion-based approaches with a 7.6× faster sampling speed and a mode coverage rate of 78%.

Technology Category

Application Category

📝 Abstract

Dataset distillation compresses large datasets into compact synthetic sets with comparable performance in training models. Despite recent progress on diffusion-based distillation, this type of method typically depends on heuristic guidance or prototype assignment, which comes with time-consuming sampling and trajectory instability and thus hurts downstream generalization especially under strong control or low IPC. We propose \emph{Path-Guided Flow Matching (PGFM)}, the first flow matching-based framework for generative distillation, which enables fast deterministic synthesis by solving an ODE in a few steps. PGFM conducts flow matching in the latent space of a frozen VAE to learn class-conditional transport from Gaussian noise to data distribution. Particularly, we develop a continuous path-to-prototype guidance algorithm for ODE-consistent path control, which allows trajectories to reliably land on assigned prototypes while preserving diversity and efficiency. Extensive experiments across high-resolution benchmarks demonstrate that PGFM matches or surpasses prior diffusion-based distillation approaches with fewer steps of sampling while delivering competitive performance with remarkably improved efficiency, e.g., 7.6$\times$ more efficient than the diffusion-based counterparts with 78\% mode coverage.

Problem

Research questions and friction points this paper is trying to address.

dataset distillation

diffusion-based distillation

trajectory instability

heuristic guidance

low IPC

Innovation

Methods, ideas, or system contributions that make the work stand out.

flow matching

dataset distillation

path-guided synthesis