AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection

πŸ“… 2025-02-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Text-driven long-video editing is severely constrained by GPU memory overhead, making it infeasible to process videos exceeding hundreds of frames. To address this, we propose a training-free two-stage framework. In the first stage, we introduce an adaptive attention pruning mechanism that dynamically compresses key-value (KV) sequences to expand the capacity of keyframes. In the second stage, we design a data-driven keyframe selection strategy that jointly optimizes semantic representativeness and temporal coherence, integrating token-level importance scoring, multi-scale inter-frame similarity modeling, and interpolation-based refinement. We further construct LongV-EVALβ€”the first high-quality benchmark for long-video editing. Evaluated on A800 GPUs, our method enables single-pass inference for videos exceeding 1,000 frames (minute-scale), achieving a 10Γ— longer video length than TokenFlow while delivering significantly improved generation quality.

Technology Category

Application Category

πŸ“ Abstract
Despite great progress, text-driven long video editing is still notoriously challenging mainly due to excessive memory overhead. Although recent efforts have simplified this task into a two-step process of keyframe translation and interpolation generation, the token-wise keyframe translation still plagues the upper limit of video length. In this paper, we propose a novel and training-free approach towards efficient and effective long video editing, termed AdaFlow. We first reveal that not all tokens of video frames hold equal importance for keyframe translation, based on which we propose an Adaptive Attention Slimming scheme for AdaFlow to squeeze the $KV$ sequence, thus increasing the number of keyframes for translations by an order of magnitude. In addition, an Adaptive Keyframe Selection scheme is also equipped to select the representative frames for joint editing, further improving generation quality. With these innovative designs, AdaFlow achieves high-quality long video editing of minutes in one inference, i.e., more than 1$k$ frames on one A800 GPU, which is about ten times longer than the compared methods, e.g., TokenFlow. To validate AdaFlow, we also build a new benchmark for long video editing with high-quality annotations, termed LongV-EVAL. Our code is released at: https://github.com/jidantang55/AdaFlow.
Problem

Research questions and friction points this paper is trying to address.

Reduces memory overhead in long video editing
Enhances keyframe translation efficiency
Improves video generation quality and length
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Attention Slimming scheme
Adaptive Keyframe Selection scheme
Training-free long video editing
πŸ”Ž Similar Papers
No similar papers found.
Shuheng Zhang
Shuheng Zhang
Master Student, Xiamen University
Multi-modal GenerationComputer Vision
Y
Yuqi Liu
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China; Institute of Artificial Intelligence, Xiamen University, 361005, P.R. China
H
Hongbo Zhou
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China; Institute of Artificial Intelligence, Xiamen University, 361005, P.R. China
Jun Peng
Jun Peng
PhD, Soochow University, Australian National University
Photovoltaics
Yiyi Zhou
Yiyi Zhou
Xiamen University
deep learninglanguage and vision
X
Xiaoshuai Sun
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China; Institute of Artificial Intelligence, Xiamen University, 361005, P.R. China
R
Rongrong Ji
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China; Institute of Artificial Intelligence, Xiamen University, 361005, P.R. China