SRC-Flow: Compact Semantic Representations Enable Normalizing Flows for Image Generation

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

227K/year
🤖 AI Summary
This work addresses the challenge that normalizing flows struggle to effectively capture semantic structure in high-dimensional image generation due to the requirement of learning a single invertible mapping. To overcome this, the authors propose a Semantic Representation Compressor (SRC) that compresses high-dimensional features extracted by a Regularized Autoencoder (RAE) into a low-dimensional semantic space for flow modeling, while leveraging a frozen RAE decoder to preserve reconstruction fidelity and substantially reduce modeling complexity. A novel constant-noise regularization tailored for fixed unconditional bijections is introduced, enabling, for the first time, efficient and high-quality likelihood-based generation in semantic space. The method achieves state-of-the-art generative performance among normalizing flows on ImageNet at 256×256 and 512×512 resolutions, with gFID scores of 1.65 and 2.07 respectively, while retaining exact likelihood evaluation and deterministic invertible sampling.
📝 Abstract
Normalizing flows (NFs) provide exact likelihoods and deterministic invertible sampling, but have historically lagged behind diffusion models for large-scale image generation. We identify a key obstacle: NFs are required to learn a single invertible transport over the full ambient space, making them highly sensitive to high-dimensional representations. This leads to a semantic-capacity mismatch in modern visual representation spaces, where semantic information is compact but encoded in overcomplete features. We propose SRC-Flow, which introduces a Semantic Representation Compressor (SRC) to compact high-dimensional RAE features into a low-dimensional semantic space before flow modeling and preserve reconstruction through the frozen RAE decoder. This compact space reduces the modeling burden of NFs and enables effective likelihood-based generation in semantic representation space. We further adopt constant noise regularization tailored to the fixed unconditional bijection learned by flows. On ImageNet $256 \times 256$ and $512 \times 512$, SRC-Flow achieves state-of-the-art generation quality among normalizing flow methods, with gFID scores of 1.65 and 2.07 under classifier-free guidance, while retaining exact likelihood computation in the compact semantic representation space and deterministic invertible sampling at the flow level. Codes and models will be available at https://github.com/longtaojiang/SRC-Flow.
Problem

Research questions and friction points this paper is trying to address.

Normalizing Flows
Image Generation
Semantic Representation
High-dimensional Representations
Likelihood-based Modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Normalizing Flows
Semantic Representation Compression
Compact Latent Space
Exact Likelihood
Deterministic Invertible Sampling
🔎 Similar Papers
No similar papers found.