🤖 AI Summary
Existing caching methods for Flow Matching rely on instantaneous velocity under high acceleration ratios, which often leads to trajectory deviation and error accumulation, thereby degrading both inference efficiency and generation quality. To address this, this work proposes MeanCache—a training-free caching framework that, for the first time, replaces instantaneous velocity with interval-averaged velocity. It further integrates trajectory stability scheduling and a peak-suppression shortest-path algorithm to optimize cache timing and Jacobian–vector product (JVP) reuse under strict computational budgets. The proposed method significantly enhances inference stability and efficiency, achieving speedups of 4.12×, 4.56×, and 3.59× on FLUX.1, Qwen-Image, and HunyuanVideo, respectively, while consistently outperforming existing baselines in generation quality.
📝 Abstract
We present MeanCache, a training-free caching framework for efficient Flow Matching inference. Existing caching methods reduce redundant computation but typically rely on instantaneous velocity information (e.g., feature caching), which often leads to severe trajectory deviations and error accumulation under high acceleration ratios. MeanCache introduces an average-velocity perspective: by leveraging cached Jacobian--vector products (JVP) to construct interval average velocities from instantaneous velocities, it effectively mitigates local error accumulation. To further improve cache timing and JVP reuse stability, we develop a trajectory-stability scheduling strategy as a practical tool, employing a Peak-Suppressed Shortest Path under budget constraints to determine the schedule. Experiments on FLUX.1, Qwen-Image, and HunyuanVideo demonstrate that MeanCache achieves 4.12X and 4.56X and 3.59X acceleration, respectively, while consistently outperforming state-of-the-art caching baselines in generation quality. We believe this simple yet effective approach provides a new perspective for Flow Matching inference and will inspire further exploration of stability-driven acceleration in commercial-scale generative models.