Distribution Matching Distillation without Fake Score Network

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

208K/year
🤖 AI Summary
This work addresses the memory and update overhead incurred by Distribution Matching Distillation (DMD) methods that rely on an auxiliary fake score network. To circumvent this limitation, the authors propose a novel paradigm that eliminates the need for an explicit score estimator. For the first time in flow-matching generators, the approach discards the dedicated score network and instead leverages pseudo-velocities generated by the model itself as a surrogate signal for the backward divergence. Combined with flow-consistent backward simulation and a self-teaching mechanism, this enables end-to-end training from scratch without any pre-trained components. Experiments demonstrate that the method significantly outperforms existing flow-matching baselines on ImageNet-1K at 256×256 resolution, achieving consistently lower FID scores across various initialization settings, thereby confirming its efficacy and robustness.
📝 Abstract
Distribution Matching Distillation (DMD) provides an effective distribution-level correction for few-step generation, while relying on an auxiliary fake-score network to track the evolving generative distribution. Recent work combines DMD-style objectives with flow-map generators to exploit both forward-divergence training and reverse-divergence correction. The fake-score estimator remains an additional component with memory and update overhead. In this work, we study whether this explicit tracker can be avoided when the generator itself has a flow-map structure. We propose Fake-Score-network-Free DMD (FSF-DMD), a DMD formulation for flow-map generators that replaces the auxiliary fake-score estimator with a generator-induced pseudo-velocity surrogate. The key observation is that the endpoint pseudo-velocity of a flow-map generator provides a tractable proxy for fake-velocity estimation, allowing the generator itself to supply the reverse-divergence signal. Building on this observation, we derive a practical objective, extend it with flow-map-consistent backward simulation, and introduce a self-teacher variant for training from scratch. In our ImageNet-1K $256 \times 256$ experiments, FSF-DMD improves flow-map baselines, reaches lower FID than the listed DMD2 comparisons in the flow-map-initialized setting, and remains effective under flow-matching initialization and training from scratch.
Problem

Research questions and friction points this paper is trying to address.

Distribution Matching Distillation
fake-score network
flow-map generator
reverse-divergence correction
few-step generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distribution Matching Distillation
Flow-Map Generator
Fake-Score-Free
Pseudo-Velocity Surrogate
Self-Teacher Training
🔎 Similar Papers
No similar papers found.