StreamGVE: Training-Free Video Editing via Few-Step Streaming Video Generation

πŸ“… 2026-05-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

200K/year
πŸ€– AI Summary
Existing video editing methods struggle to simultaneously achieve high quality and fidelity to user intent in few-step generation, often relying on time-consuming iterative optimization. This work proposes a training-free, streaming video editing framework built upon a pretrained streaming generative model. By integrating dual-branch few-step sampling, self-attention bridging, cross-attention anchoring and enhancement, source-oriented guidance, and visual prompting strategies, the approach transcends the limitations of conventional β€œdata-to-data” paradigms. The method demonstrates significant performance gains over state-of-the-art techniques across diverse editing tasks, achieving high-quality results with remarkable efficiency and strong generalization capabilities in few-step video editing.
πŸ“ Abstract
Although existing video editing methods are generally feasible, they often require many costly iterations and still struggle to deliver high-quality yet satisfying editing results. We attribute this limitation to the prevalent data-to-data paradigm, which is less compatible with modern generative models than noise-to-data generation. To address this gap, we revisit video editing from a noise-to-data perspective and propose Streaming-Generation-based Video Editing (StreamGVE), which preserves few-step sampling while seamlessly injecting source-video conditions. Built on pre-trained streaming generation models, StreamGVE introduces dual-branch fast sampling with a self-attention bridge and cross-attention grounding/boosting to satisfy both sampling and conditioning requirements. We further propose source-oriented guidance to improve target-generation quality, and a visual prompting strategy to enhance editing flexibility and practicality. The method is effective, robust, and generalizable across different models. Extensive experiments on diverse video editing tasks show that StreamGVE consistently outperforms existing approaches, even in few-step settings with minimal time cost.
Problem

Research questions and friction points this paper is trying to address.

video editing
high-quality editing
costly iterations
editing results
generative models
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free video editing
noise-to-data generation
streaming video generation
few-step sampling
visual prompting