Frequency-Aware Flow Matching for High-Quality Image Generation

📅 2026-04-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

222K/year
🤖 AI Summary
This work addresses the imbalanced modeling of frequency components in conventional flow matching approaches for image generation, which often leads to insufficient recovery of high-frequency details. To overcome this limitation, the authors propose Frequency-Aware Flow Matching (FreqFlow), the first method to integrate frequency decomposition into the flow matching framework. FreqFlow employs a dual-branch architecture that separately processes low- and high-frequency components and incorporates a time-dependent adaptive weighting mechanism to jointly optimize global structure and local texture in the latent space. Evaluated on class-conditional ImageNet-256 generation, the method achieves an FID of 1.38, substantially outperforming DiT (by −0.79) and SiT (by −0.58), and demonstrates enhanced image coherence and texture sharpness.

Technology Category

Application Category

📝 Abstract
Flow matching models have emerged as a powerful framework for realistic image generation by learning to reverse a corruption process that progressively adds Gaussian noise. However, because noise is injected in the latent domain, its impact on different frequency components is non-uniform. As a result, during inference, flow matching models tend to generate low-frequency components (global structure) in the early stages, while high-frequency components (fine details) emerge only later in the reverse process. Building on this insight, we propose Frequency-Aware Flow Matching (FreqFlow), a novel approach that explicitly incorporates frequency-aware conditioning into the flow matching framework via time-dependent adaptive weighting. We introduce a two-branch architecture: (1) a frequency branch that separately processes low- and high-frequency components to capture global structure and refine textures and edges, and (2) a spatial branch that synthesizes images in the latent domain, guided by the frequency branch's output. By explicitly integrating frequency information into the generation process, FreqFlow ensures that both large-scale coherence and fine-grained details are effectively modeled low-frequency conditioning reinforces global structure, while high-frequency conditioning enhances texture fidelity and detail sharpness. On the class-conditional ImageNet-256 generation benchmark, our method achieves state-of-the-art performance with an FID of 1.38, surpassing the prior diffusion model DiT and flow matching model SiT by 0.79 and 0.58 FID, respectively. Code is available at https://github.com/OliverRensu/FreqFlow.
Problem

Research questions and friction points this paper is trying to address.

flow matching
frequency components
image generation
high-frequency details
low-frequency structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

frequency-aware
flow matching
two-branch architecture
adaptive weighting
high-fidelity image generation
🔎 Similar Papers