StreamGVE: Training-Free Video Editing via Few-Step Streaming Video Generation

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Existing video editing methods struggle to simultaneously achieve high quality and fidelity to user intent in few-step generation, often relying on time-consuming iterative optimization. This work proposes a training-free, streaming video editing framework built upon a pretrained streaming generative model. By integrating dual-branch few-step sampling, self-attention bridging, cross-attention anchoring and enhancement, source-oriented guidance, and visual prompting strategies, the approach transcends the limitations of conventional “data-to-data” paradigms. The method demonstrates significant performance gains over state-of-the-art techniques across diverse editing tasks, achieving high-quality results with remarkable efficiency and strong generalization capabilities in few-step video editing.

📝 Abstract

Although existing video editing methods are generally feasible, they often require many costly iterations and still struggle to deliver high-quality yet satisfying editing results. We attribute this limitation to the prevalent data-to-data paradigm, which is less compatible with modern generative models than noise-to-data generation. To address this gap, we revisit video editing from a noise-to-data perspective and propose Streaming-Generation-based Video Editing (StreamGVE), which preserves few-step sampling while seamlessly injecting source-video conditions. Built on pre-trained streaming generation models, StreamGVE introduces dual-branch fast sampling with a self-attention bridge and cross-attention grounding/boosting to satisfy both sampling and conditioning requirements. We further propose source-oriented guidance to improve target-generation quality, and a visual prompting strategy to enhance editing flexibility and practicality. The method is effective, robust, and generalizable across different models. Extensive experiments on diverse video editing tasks show that StreamGVE consistently outperforms existing approaches, even in few-step settings with minimal time cost.

Problem

Research questions and friction points this paper is trying to address.

video editing

high-quality editing

costly iterations

editing results

generative models

Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free video editing

noise-to-data generation

streaming video generation