Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models

📅 2025-12-02

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing flow- and diffusion-based generative models require hundreds to thousands of neural function evaluations (NFEs) per likelihood computation, severely limiting sampling and likelihood estimation efficiency; while mainstream distillation methods accelerate sampling, they forfeit exact likelihood computability. This paper proposes F2D2, a joint distillation framework that— for the first time—enables end-to-end co-distillation of sampling trajectories and cumulative divergence. By sharing the velocity field and introducing a lightweight divergence prediction head, F2D2 achieves compatibility with few-step sampling models without modifying the backbone architecture. Our method computes both high-fidelity samples and high-accuracy log-likelihood estimates using only two NFEs—outperforming teacher models requiring over a thousand steps. F2D2 thus fundamentally resolves the long-standing “likelihood-intractability” bottleneck in efficient generative modeling.

Technology Category

Application Category

📝 Abstract

Log-likelihood evaluation enables important capabilities in generative models, including model comparison, certain fine-tuning objectives, and many downstream applications. Yet paradoxically, some of today's best generative models -- diffusion and flow-based models -- still require hundreds to thousands of neural function evaluations (NFEs) to compute a single likelihood. While recent distillation methods have successfully accelerated sampling to just a few steps, they achieve this at the cost of likelihood tractability: existing approaches either abandon likelihood computation entirely or still require expensive integration over full trajectories. We present fast flow joint distillation (F2D2), a framework that simultaneously reduces the number of NFEs required for both sampling and likelihood evaluation by two orders of magnitude. Our key insight is that in continuous normalizing flows, the coupled ODEs for sampling and likelihood are computed from a shared underlying velocity field, allowing us to jointly distill both the sampling trajectory and cumulative divergence using a single model. F2D2 is modular, compatible with existing flow-based few-step sampling models, and requires only an additional divergence prediction head. Experiments demonstrate F2D2's capability of achieving accurate log-likelihood with few-step evaluations while maintaining high sample quality, solving a long-standing computational bottleneck in flow-based generative models. As an application of our approach, we propose a lightweight self-guidance method that enables a 2-step MeanFlow model to outperform a 1024 step teacher model with only a single additional backward NFE.

Problem

Research questions and friction points this paper is trying to address.

Accelerates likelihood evaluation and sampling in flow-based models

Reduces neural function evaluations by two orders of magnitude

Maintains sample quality while solving computational bottlenecks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint distillation for fast sampling and likelihood evaluation

Modular framework with additional divergence prediction head

Lightweight self-guidance method outperforms teacher model

🔎 Similar Papers

Amortized Posterior Sampling with Diffusion Prior Distillation