🤖 AI Summary
This work addresses the challenge that normalizing flows struggle to effectively capture semantic structure in high-dimensional image generation due to the requirement of learning a single invertible mapping. To overcome this, the authors propose a Semantic Representation Compressor (SRC) that compresses high-dimensional features extracted by a Regularized Autoencoder (RAE) into a low-dimensional semantic space for flow modeling, while leveraging a frozen RAE decoder to preserve reconstruction fidelity and substantially reduce modeling complexity. A novel constant-noise regularization tailored for fixed unconditional bijections is introduced, enabling, for the first time, efficient and high-quality likelihood-based generation in semantic space. The method achieves state-of-the-art generative performance among normalizing flows on ImageNet at 256×256 and 512×512 resolutions, with gFID scores of 1.65 and 2.07 respectively, while retaining exact likelihood evaluation and deterministic invertible sampling.
📝 Abstract
Normalizing flows (NFs) provide exact likelihoods and deterministic invertible sampling, but have historically lagged behind diffusion models for large-scale image generation. We identify a key obstacle: NFs are required to learn a single invertible transport over the full ambient space, making them highly sensitive to high-dimensional representations. This leads to a semantic-capacity mismatch in modern visual representation spaces, where semantic information is compact but encoded in overcomplete features. We propose SRC-Flow, which introduces a Semantic Representation Compressor (SRC) to compact high-dimensional RAE features into a low-dimensional semantic space before flow modeling and preserve reconstruction through the frozen RAE decoder. This compact space reduces the modeling burden of NFs and enables effective likelihood-based generation in semantic representation space. We further adopt constant noise regularization tailored to the fixed unconditional bijection learned by flows. On ImageNet $256 \times 256$ and $512 \times 512$, SRC-Flow achieves state-of-the-art generation quality among normalizing flow methods, with gFID scores of 1.65 and 2.07 under classifier-free guidance, while retaining exact likelihood computation in the compact semantic representation space and deterministic invertible sampling at the flow level. Codes and models will be available at https://github.com/longtaojiang/SRC-Flow.