Flow caching for autoregressive video generation

📅 2026-02-11
📈 Citations: 1
Influential: 1
📄 PDF
🤖 AI Summary
Autoregressive video generation suffers from inefficiency due to its sequential, block-by-block processing, and existing caching methods struggle to accommodate the varying denoising requirements of different video blocks at the same timestep. This work proposes FlowCache—the first dynamic caching framework tailored for autoregressive video generation—that overcomes the limitations of conventional uniform caching assumptions through a block-aware independent caching strategy and a KV cache compression mechanism jointly optimized for importance and redundancy. Under fixed memory constraints, FlowCache achieves 2.38× and 6.7× speedups on MAGI-1 and SkyReels-V2, respectively, with negligible changes in VBench quality metrics (+0.87 / −0.79), significantly advancing autoregressive models toward efficient, high-quality generation of ultra-long videos.

Technology Category

Application Category

📝 Abstract
Autoregressive models, often built on Transformer architectures, represent a powerful paradigm for generating ultra-long videos by synthesizing content in sequential chunks. However, this sequential generation process is notoriously slow. While caching strategies have proven effective for accelerating traditional video diffusion models, existing methods assume uniform denoising across all frames-an assumption that breaks down in autoregressive models where different video chunks exhibit varying similarity patterns at identical timesteps. In this paper, we present FlowCache, the first caching framework specifically designed for autoregressive video generation. Our key insight is that each video chunk should maintain independent caching policies, allowing fine-grained control over which chunks require recomputation at each timestep. We introduce a chunkwise caching strategy that dynamically adapts to the unique denoising characteristics of each chunk, complemented by a joint importance-redundancy optimized KV cache compression mechanism that maintains fixed memory bounds while preserving generation quality. Our method achieves remarkable speedups of 2.38 times on MAGI-1 and 6.7 times on SkyReels-V2, with negligible quality degradation (VBench: 0.87 increase and 0.79 decrease respectively). These results demonstrate that FlowCache successfully unlocks the potential of autoregressive models for real-time, ultra-long video generation-establishing a new benchmark for efficient video synthesis at scale. The code is available at https://github.com/mikeallen39/FlowCache.
Problem

Research questions and friction points this paper is trying to address.

autoregressive video generation
flow caching
video synthesis
generation speed
chunkwise denoising
Innovation

Methods, ideas, or system contributions that make the work stand out.

flow caching
autoregressive video generation
chunkwise caching
KV cache compression
efficient video synthesis
🔎 Similar Papers
No similar papers found.
Y
Yuexiao Ma
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China; ByteDance
X
Xuzhe Zheng
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China
J
Jing Xu
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China
X
Xiwei Xu
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China
F
Feng Ling
ByteDance
Xiawu Zheng
Xiawu Zheng
Associate Professor, IEEE Senior Member, Xiamen University
Automated Machine LearningNetwork CompressionNeural Architecture SearchAutoML
Huafeng Kuang
Huafeng Kuang
ByteDance Inc.
Multimodal Understanding and GenerationAdversarial Robustness
H
Huixia Li
ByteDance
Xing Wang
Xing Wang
ByteDance
image processingdeep learningcomputer vision
Xuefeng Xiao
Xuefeng Xiao
ByteDance Seed
Computer VisionEfficient AI
F
Fei Chao
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China
R
Rongrong Ji
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China