VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

Existing rectified flow acceleration methods suffer from degraded generation quality due to static caching mechanisms that fail to adapt to dynamic inputs. This work proposes VDE, a training-free acceleration approach that fundamentally shifts the paradigm from “cache-and-reuse” to “decompose-and-estimate.” By decomposing the velocity field into parallel and orthogonal components, VDE leverages their directional stability and temporal predictability to enable adaptive estimation, while periodically performing full forward passes to suppress error accumulation. The method achieves substantial speedups in both image and video generation—e.g., 3.22× acceleration on Flux—while maintaining high fidelity, as evidenced by an LPIPS of only 0.069 on Qwen-Image, representing a 52.2% reduction in error compared to the best baseline.

📝 Abstract

Though rectified flow models have achieved remarkable performance in image, video, and 3D generation, their practical deployments are challenged by slow inference speeds. Prior acceleration methods reuse cached features from previous steps, which neglects the growing mismatch between static caches and the evolving input, leading to reduced output fidelity. This work proposes Velocity Decomposition and Estimation (VDE), a training-free acceleration method that shifts the paradigm from caching-and-reusing to decomposing-and-estimating. Specifically, VDE decomposes the model's velocity into components parallel and orthogonal to the input, exploiting their temporal predictability and directional stability for precise, input-adaptive estimation. To prevent error accumulation, it periodically anchors the model's state via full forward passes. Extensive experiments on image and video generation tasks demonstrate that VDE achieves substantial acceleration with minimal loss in visual quality. Notably, VDE accelerates Flux by 3.22 times and achieves an LPIPS of 0.069 on Qwen-Image, outperforming the best baseline with a 52.2% reduction.

Problem

Research questions and friction points this paper is trying to address.

rectified flow

inference acceleration

feature caching

output fidelity

velocity estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

rectified flow

training-free acceleration

velocity decomposition