StreamFlow: Theory, Algorithm, and Implementation for High-Efficiency Rectified Flow Generation

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Rectified Flow (RF) models exhibit fundamental theoretical and architectural differences from conventional diffusion models, rendering existing acceleration techniques incompatible. To address this, we propose the first end-to-end efficient acceleration framework specifically designed for RF. Our method introduces three core innovations: (1) batched velocity field modeling, which decouples temporal dependencies to enable parallel computation; (2) heterogeneous timestep vectorized scheduling, optimizing hardware utilization; and (3) dynamic TensorRT compilation, achieving operator-level optimization and memory-access co-design. By tightly integrating flow-matching theory with system-level optimizations, our framework achieves up to 611% speedup on 512×512 image generation—significantly surpassing the current average acceleration of 18% across general-purpose methods—and enables, for the first time, efficient high-resolution deployment of RF models.

Technology Category

Application Category

📝 Abstract

New technologies such as Rectified Flow and Flow Matching have significantly improved the performance of generative models in the past two years, especially in terms of control accuracy, generation quality, and generation efficiency. However, due to some differences in its theory, design, and existing diffusion models, the existing acceleration methods cannot be directly applied to the Rectified Flow model. In this article, we have comprehensively implemented an overall acceleration pipeline from the aspects of theory, design, and reasoning strategies. This pipeline uses new methods such as batch processing with a new velocity field, vectorization of heterogeneous time-step batch processing, and dynamic TensorRT compilation for the new methods to comprehensively accelerate related models based on flow models. Currently, the existing public methods usually achieve an acceleration of 18%, while experiments have proved that our new method can accelerate the 512*512 image generation speed to up to 611%, which is far beyond the current non-generalized acceleration methods.

Problem

Research questions and friction points this paper is trying to address.

Accelerates Rectified Flow models for faster image generation

Overcomes limitations of existing diffusion model acceleration methods

Enhances efficiency in control, quality, and speed of generative models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Batch processing with new velocity field

Vectorization of heterogeneous time-step batch processing

Dynamic TensorRT compilation for flow models

🔎 Similar Papers

FlowPrecision: Advancing FPGA-Based Real-Time Fluid Flow Estimation with Linear Quantization

2024-03-042024 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops)Citations: 5

Pyramidal Flow Matching for Efficient Video Generative Modeling

2024-10-08arXiv.orgCitations: 31

TikTok

San Jose, California

Sr. Research Engineer/Scientist (all levels), Efficient Models

TikTok

San Jose, California

Sr. Research Engineer/Scientist (all levels), Efficient Models

TikTok

Seattle, Washington

Large Model Training Acceleration Engineer

ByteDance

圣何塞

Large Model Training Acceleration Engineer

TikTok

San Jose, California

Authors to Follow