🤖 AI Summary
Current text-to-video editing models lack fine-grained, multi-dimensional, and unified evaluation standards; aggregate scores obscure task-specific performance disparities. To address this, we propose EditBoard—the first comprehensive benchmark for text-driven video editing—featuring a task-oriented, multi-dimensional evaluation framework covering four representative editing categories: local editing, object replacement, attribute modification, and motion transfer. We define nine automated metrics across four dimensions: visual quality, temporal consistency, text-video alignment, and editing fidelity. Notably, we introduce three novel fidelity metrics—EditFID, EditCLIP, and EditTemporal—designed specifically for video editing assessment. EditBoard enables decomposable capability attribution analysis, revealing significant performance variations across models and editing tasks. The benchmark’s codebase and standardized evaluation protocol are publicly released to foster reproducibility and community-wide evaluation standardization.
📝 Abstract
The rapid development of diffusion models has significantly advanced AI-generated content (AIGC), particularly in Text-to-Image (T2I) and Text-to-Video (T2V) generation. Text-based video editing, leveraging these generative capabilities, has emerged as a promising field, enabling precise modifications to videos based on text prompts. Despite the proliferation of innovative video editing models, there is a conspicuous lack of comprehensive evaluation benchmarks that holistically assess these models' performance across various dimensions. Existing evaluations are limited and inconsistent, typically summarizing overall performance with a single score, which obscures models' effectiveness on individual editing tasks. To address this gap, we propose EditBoard, the first comprehensive evaluation benchmark for text-based video editing models. EditBoard encompasses nine automatic metrics across four dimensions, evaluating models on four task categories and introducing three new metrics to assess fidelity. This task-oriented benchmark facilitates objective evaluation by detailing model performance and providing insights into each model's strengths and weaknesses. By open-sourcing EditBoard, we aim to standardize evaluation and advance the development of robust video editing models.