VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

216K/year
🤖 AI Summary
Existing rectified flow acceleration methods suffer from degraded generation quality due to static caching mechanisms that fail to adapt to dynamic inputs. This work proposes VDE, a training-free acceleration approach that fundamentally shifts the paradigm from “cache-and-reuse” to “decompose-and-estimate.” By decomposing the velocity field into parallel and orthogonal components, VDE leverages their directional stability and temporal predictability to enable adaptive estimation, while periodically performing full forward passes to suppress error accumulation. The method achieves substantial speedups in both image and video generation—e.g., 3.22× acceleration on Flux—while maintaining high fidelity, as evidenced by an LPIPS of only 0.069 on Qwen-Image, representing a 52.2% reduction in error compared to the best baseline.
📝 Abstract
Though rectified flow models have achieved remarkable performance in image, video, and 3D generation, their practical deployments are challenged by slow inference speeds. Prior acceleration methods reuse cached features from previous steps, which neglects the growing mismatch between static caches and the evolving input, leading to reduced output fidelity. This work proposes Velocity Decomposition and Estimation (VDE), a training-free acceleration method that shifts the paradigm from caching-and-reusing to decomposing-and-estimating. Specifically, VDE decomposes the model's velocity into components parallel and orthogonal to the input, exploiting their temporal predictability and directional stability for precise, input-adaptive estimation. To prevent error accumulation, it periodically anchors the model's state via full forward passes. Extensive experiments on image and video generation tasks demonstrate that VDE achieves substantial acceleration with minimal loss in visual quality. Notably, VDE accelerates Flux by 3.22 times and achieves an LPIPS of 0.069 on Qwen-Image, outperforming the best baseline with a 52.2% reduction.
Problem

Research questions and friction points this paper is trying to address.

rectified flow
inference acceleration
feature caching
output fidelity
velocity estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

rectified flow
training-free acceleration
velocity decomposition
input-adaptive estimation
temporal predictability
🔎 Similar Papers
No similar papers found.