🤖 AI Summary
Current instruction-guided video editing is hindered by two major bottlenecks: scarcity of high-quality training data and absence of standardized, human-aligned evaluation benchmarks. To address these challenges, we introduce OpenVE-3M—the first large-scale, multi-type instruction-video editing dataset comprising both spatially aligned and unaligned pairs. It features a fine-grained taxonomy covering eight distinct editing operations and employs a high-precision automated generation pipeline coupled with rigorous quality filtering. Concurrently, we release OpenVE-Bench—the first unified evaluation benchmark exhibiting strong correlation with human judgments. Leveraging OpenVE-3M, we train OpenVE-Edit, a 5B-parameter open-source model that achieves state-of-the-art performance on OpenVE-Bench, significantly outperforming all existing open-source models—including a 14B-parameter baseline—while markedly improving editing accuracy and inference efficiency.
📝 Abstract
The quality and diversity of instruction-based image editing datasets are continuously increasing, yet large-scale, high-quality datasets for instruction-based video editing remain scarce. To address this gap, we introduce OpenVE-3M, an open-source, large-scale, and high-quality dataset for instruction-based video editing. It comprises two primary categories: spatially-aligned edits (Global Style, Background Change, Local Change, Local Remove, Local Add, and Subtitles Edit) and non-spatially-aligned edits (Camera Multi-Shot Edit and Creative Edit). All edit types are generated via a meticulously designed data pipeline with rigorous quality filtering. OpenVE-3M surpasses existing open-source datasets in terms of scale, diversity of edit types, instruction length, and overall quality. Furthermore, to address the lack of a unified benchmark in the field, we construct OpenVE-Bench, containing 431 video-edit pairs that cover a diverse range of editing tasks with three key metrics highly aligned with human judgment. We present OpenVE-Edit, a 5B model trained on our dataset that demonstrates remarkable efficiency and effectiveness by setting a new state-of-the-art on OpenVE-Bench, outperforming all prior open-source models including a 14B baseline. Project page is at https://github.com/lewandofskee/OpenVE.