IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing video editing benchmarks suffer from insufficient source video diversity, narrow task coverage, and unidimensional evaluation criteria, hindering systematic assessment of instruction-guided video editing. To address this, we introduce InstructVidBench—the first modern benchmark tailored to this task—comprising 600 high-quality source videos and 35 fine-grained editing tasks across eight broad categories, enabling evaluation of complex semantic understanding and multi-step instruction following. We propose a novel three-dimensional evaluation protocol—Quality, Consistency, and Fidelity—that integrates traditional metrics with automated scoring from multimodal large language models (e.g., Video-LLaVA), achieving strong alignment with human judgments (Spearman’s ρ > 0.89). Instructions are synthetically generated by LLMs and rigorously validated by domain experts to ensure semantic accuracy and task feasibility. Extensive experiments demonstrate that InstructVidBench effectively discriminates state-of-the-art methods, substantially improving the systematicity, reliability, and generalizability of video editing evaluation.

Technology Category

Application Category

📝 Abstract

Instruction-guided video editing has emerged as a rapidly advancing research direction, offering new opportunities for intuitive content transformation while also posing significant challenges for systematic evaluation. Existing video editing benchmarks fail to support the evaluation of instruction-guided video editing adequately and further suffer from limited source diversity, narrow task coverage and incomplete evaluation metrics. To address the above limitations, we introduce IVEBench, a modern benchmark suite specifically designed for instruction-guided video editing assessment. IVEBench comprises a diverse database of 600 high-quality source videos, spanning seven semantic dimensions, and covering video lengths ranging from 32 to 1,024 frames. It further includes 8 categories of editing tasks with 35 subcategories, whose prompts are generated and refined through large language models and expert review. Crucially, IVEBench establishes a three-dimensional evaluation protocol encompassing video quality, instruction compliance and video fidelity, integrating both traditional metrics and multimodal large language model-based assessments. Extensive experiments demonstrate the effectiveness of IVEBench in benchmarking state-of-the-art instruction-guided video editing methods, showing its ability to provide comprehensive and human-aligned evaluation outcomes.

Problem

Research questions and friction points this paper is trying to address.

Existing benchmarks inadequately evaluate instruction-guided video editing

Current benchmarks have limited source diversity and task coverage

Existing evaluation metrics for video editing are incomplete

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces IVEBench benchmark suite for video editing assessment

Includes diverse video database with seven semantic dimensions

Establishes three-dimensional evaluation protocol with multimodal metrics

🔎 Similar Papers

EditBoard: Towards A Comprehensive Evaluation Benchmark for Text-based Video Editing Models