Blockwise Flow Matching: Improving Flow Matching Models For Efficient High-Quality Generation

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Flow Matching (FM) models rely on a single monolithic network to model the entire generation trajectory, struggling to capture heterogeneous signal characteristics across timesteps while incurring high inference costs due to repeated full forward passes. To address this, we propose Chunked Flow Matching (CFM): a framework that partitions the generation trajectory into multiple temporal chunks, each modeled independently by a lightweight, specialized velocity module. We further introduce a semantic feature guidance module to align with pretrained representations and design a feature residual approximation strategy to enhance reconstruction fidelity. CFM achieves high-fidelity generation while substantially reducing computational overhead. On ImageNet 256×256, CFM delivers 2.1–4.9× inference speedup over state-of-the-art FM methods, effectively breaking the longstanding quality-efficiency trade-off inherent in existing FM approaches.

Technology Category

Application Category

📝 Abstract
Recently, Flow Matching models have pushed the boundaries of high-fidelity data generation across a wide range of domains. It typically employs a single large network to learn the entire generative trajectory from noise to data. Despite their effectiveness, this design struggles to capture distinct signal characteristics across timesteps simultaneously and incurs substantial inference costs due to the iterative evaluation of the entire model. To address these limitations, we propose Blockwise Flow Matching (BFM), a novel framework that partitions the generative trajectory into multiple temporal segments, each modeled by smaller but specialized velocity blocks. This blockwise design enables each block to specialize effectively in its designated interval, improving inference efficiency and sample quality. To further enhance generation fidelity, we introduce a Semantic Feature Guidance module that explicitly conditions velocity blocks on semantically rich features aligned with pretrained representations. Additionally, we propose a lightweight Feature Residual Approximation strategy that preserves semantic quality while significantly reducing inference cost. Extensive experiments on ImageNet 256x256 demonstrate that BFM establishes a substantially improved Pareto frontier over existing Flow Matching methods, achieving 2.1x to 4.9x accelerations in inference complexity at comparable generation performance. Code is available at https://github.com/mlvlab/BFM.
Problem

Research questions and friction points this paper is trying to address.

Improving flow matching models for efficient high-quality generation
Partitioning generative trajectory into specialized temporal segments
Reducing inference costs while maintaining generation fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Partitions generative trajectory into specialized temporal segments
Introduces semantic feature guidance using pretrained representations
Employs lightweight feature residual approximation for efficiency
🔎 Similar Papers
No similar papers found.