PhyEduVideo: A Benchmark for Evaluating Text-to-Video Models for Physics Education

📅 2026-01-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the challenge that current text-to-video (T2V) models struggle to simultaneously ensure visual coherence and physical accuracy in educational contexts, particularly due to the absence of domain-specific evaluation benchmarks. To bridge this gap, we propose the first T2V evaluation framework tailored for physics education, decomposing core concepts—spanning mechanics, fluids, optics, electromagnetism, and thermodynamics—into fine-grained instructional units. We construct a structured prompt system and integrate both human and automated metrics to jointly assess the visual quality and conceptual fidelity of generated videos. Experimental results reveal that while existing models produce visually coherent outputs, they exhibit significantly higher error rates in representing abstract physical principles, especially in electromagnetism and thermodynamics, thereby highlighting the critical challenge of ensuring content reliability for educational applications.

Technology Category

Application Category

📝 Abstract

Generative AI models, particularly Text-to-Video (T2V) systems, offer a promising avenue for transforming science education by automating the creation of engaging and intuitive visual explanations. In this work, we take a first step toward evaluating their potential in physics education by introducing a dedicated benchmark for explanatory video generation. The benchmark is designed to assess how well T2V models can convey core physics concepts through visual illustrations. Each physics concept in our benchmark is decomposed into granular teaching points, with each point accompanied by a carefully crafted prompt intended for visual explanation of the teaching point. T2V models are evaluated on their ability to generate accurate videos in response to these prompts. Our aim is to systematically explore the feasibility of using T2V models to generate high-quality, curriculum-aligned educational content-paving the way toward scalable, accessible, and personalized learning experiences powered by AI. Our evaluation reveals that current models produce visually coherent videos with smooth motion and minimal flickering, yet their conceptual accuracy is less reliable. Performance in areas such as mechanics, fluids, and optics is encouraging, but models struggle with electromagnetism and thermodynamics, where abstract interactions are harder to depict. These findings underscore the gap between visual quality and conceptual correctness in educational video generation. We hope this benchmark helps the community close that gap and move toward T2V systems that can deliver accurate, curriculum-aligned physics content at scale. The benchmark and accompanying codebase are publicly available at https://github.com/meghamariamkm/PhyEduVideo.

Problem

Research questions and friction points this paper is trying to address.

Text-to-Video

physics education

conceptual accuracy

educational video generation

AI benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-to-Video

Physics Education

Benchmark