🤖 AI Summary
This work proposes Normalized Flow Matching (NFM), a novel generative training framework that overcomes the limitations of traditional flow matching methods, which are constrained by independent or optimal transport couplings and struggle to capture complex dependencies between noise and data. NFM leverages the bijective structure of a pre-trained autoregressive normalizing flow (AR-NF) to distill a high-quality deterministic coupling, which is then used to train a lightweight student flow model. By integrating autoregressive normalizing flows, flow matching, and knowledge distillation, the method enables efficient and effective learning. Experimental results demonstrate that the student model significantly outperforms existing flow matching approaches in generation quality and even surpasses the teacher AR-NF model.
📝 Abstract
Flow models have rapidly become the go-to method for training and deploying large-scale generators, owing their success to inference-time flexibility via adjustable integration steps. A crucial ingredient in flow training is the choice of coupling measure for sampling noise/data pairs that define the flow matching (FM) regression loss. While FM training defaults usually to independent coupling, recent works show that adaptive couplings informed by noise/data distributions (e.g., via optimal transport, OT) improve both model training and inference. We radicalize this insight by shifting the paradigm: rather than computing adaptive couplings directly, we use distilled couplings from a different, pretrained model capable of placing noise and data spaces in bijection -- a property intrinsic to normalizing flows (NF) through their maximum likelihood and invertibility requirements. Leveraging recent advances in NF image generation via auto-regressive (AR) blocks, we propose Normalized Flow Matching (NFM), a new method that distills the quasi-deterministic coupling of pretrained NF models to train student flow models. These students achieve the best of both worlds: significantly outperforming flow models trained with independent or even OT couplings, while also improving on the teacher AR-NF model.