EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the absence of specialized benchmarks for evaluating AI-generated video (AIGV) quality in educational contexts, particularly for elementary mathematics instruction. To bridge this gap, we introduce EduAIGV-1k, the first education-oriented AIGV evaluation benchmark, comprising 1,130 instructional videos generated by ten state-of-the-art text-to-video models, accompanied by fine-grained human annotations on perceptual quality and prompt alignment. Building upon this dataset, we propose EduVQA, a multidimensional and interpretable video quality assessment model that innovatively incorporates word-level and sentence-level prompt alignment annotations and features a Structured 2D Mixture-of-Experts (S2D-MoE) module to capture dependencies between overall quality and sub-dimensions. Experiments demonstrate that EduVQA significantly outperforms existing baselines. The dataset and code will be publicly released to advance research in educational AIGC evaluation.

Technology Category

Application Category

📝 Abstract

While AI-generated content (AIGC) models have achieved remarkable success in generating photorealistic videos, their potential to support visual, story-driven learning in education remains largely untapped. To close this gap, we present EduAIGV-1k, the first benchmark dataset and evaluation framework dedicated to assessing the quality of AI-generated videos (AIGVs) designed to teach foundational math concepts, such as numbers and geometry, to young learners. EduAIGV-1k contains 1,130 short videos produced by ten state-of-the-art text-to-video (T2V) models using 113 pedagogy-oriented prompts. Each video is accompanied by rich, fine-grained annotations along two complementary axes: (1) Perceptual quality, disentangled into spatial and temporal fidelity, and (2) Prompt alignment, labeled at the word-level and sentence-level to quantify the degree to which each mathematical concept in the prompt is accurately grounded in the generated video. These fine-grained annotations transform each video into a multi-dimensional, interpretable supervision signal, far beyond a single quality score. Leveraging this dense feedback, we introduce EduVQA for both perceptual and alignment quality assessment of AIGVs. In particular, we propose a Structured 2D Mixture-of-Experts (S2D-MoE) module, which enhances the dependency between overall quality and each sub-dimension by shared experts and dynamic 2D gating matrix. Extensive experiments show our EduVQA consistently outperforms existing VQA baselines. Both our dataset and code will be publicly available.

Problem

Research questions and friction points this paper is trying to address.

AI-generated video

video quality assessment

education

prompt alignment

perceptual quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-generated video

educational benchmark

fine-grained annotation

prompt alignment

Structured 2D Mixture-of-Experts

🔎 Similar Papers

No similar papers found.

Authors to Follow