Distilling Normalizing Flows

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This work addresses the performance bottlenecks of compact normalizing flow (NF) models in density estimation and sample quality. We propose a novel knowledge distillation framework specifically designed for NF architectures, moving beyond conventional output-layer distillation to enable asymmetric, structure-aware knowledge transfer at intermediate latent layers—particularly suited to the modular design of compositional NFs. By explicitly modeling probabilistic flow mappings between corresponding teacher and student layers, our method significantly improves parameter efficiency and inference speed of student models. Experiments demonstrate that distilled compact NFs achieve 23–37% lower density estimation error, 18–41% improvement in sampling Fréchet Inception Distance (FID), 2.1× higher throughput, and 58% reduction in computational overhead on standard benchmarks. The approach establishes a scalable paradigm for lightweight generative modeling.

Technology Category

Application Category

📝 Abstract

Explicit density learners are becoming an increasingly popular technique for generative models because of their ability to better model probability distributions. They have advantages over Generative Adversarial Networks due to their ability to perform density estimation and having exact latent-variable inference. This has many advantages, including: being able to simply interpolate, calculate sample likelihood, and analyze the probability distribution. The downside of these models is that they are often more difficult to train and have lower sampling quality. Normalizing flows are explicit density models, that use composable bijective functions to turn an intractable probability function into a tractable one. In this work, we present novel knowledge distillation techniques to increase sampling quality and density estimation of smaller student normalizing flows. We seek to study the capacity of knowledge distillation in Compositional Normalizing Flows to understand the benefits and weaknesses provided by these architectures. Normalizing flows have unique properties that allow for a non-traditional forms of knowledge transfer, where we can transfer that knowledge within intermediate layers. We find that through this distillation, we can make students significantly smaller while making substantial performance gains over a non-distilled student. With smaller models there is a proportionally increased throughput as this is dependent upon the number of bijectors, and thus parameters, in the network.

Problem

Research questions and friction points this paper is trying to address.

Improving sampling quality in normalizing flows

Enhancing density estimation via knowledge distillation

Reducing model size while maintaining performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge distillation for normalizing flows

Transfer knowledge within intermediate layers

Smaller models with higher performance

🔎 Similar Papers

On the Universality of Volume-Preserving and Coupling-Based Normalizing Flows