ConfusionBench: An Expert-Validated Benchmark for Confusion Recognition and Localization in Educational Videos

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing datasets for detecting student confusion in educational videos often suffer from label noise, coarse temporal annotations, and a lack of expert validation, hindering fine-grained recognition and precise temporal localization. To address these limitations, this work proposes a multi-stage data curation pipeline that integrates model-assisted filtering, manual verification, and expert validation to construct ConfusionBench—a high-quality benchmark comprising a balanced classification dataset and a finely annotated video localization dataset. Notably, this is the first effort to incorporate expert validation into the annotation process, significantly enhancing label reliability. The benchmark also includes zero-shot evaluation protocols and visualization tools for both proprietary and open-source models. Experimental results reveal that proprietary models achieve superior overall performance but tend to over-predict transitional segments, whereas open-source models are more conservative yet exhibit higher miss rates. The dataset and tools will be publicly released to advance research in educational AI.

Technology Category

Application Category

📝 Abstract
Recognizing and localizing student confusion from video is an important yet challenging problem in educational AI. Existing confusion datasets suffer from noisy labels, coarse temporal annotations, and limited expert validation, which hinder reliable fine-grained recognition and temporally grounded analysis. To address these limitations, we propose a practical multi-stage filtering pipeline that integrates two stages of model-assisted screening, researcher curation, and expert validation to build a higher-quality benchmark for confusion understanding. Based on this pipeline, we introduce ConfusionBench, a new benchmark for educational videos consisting of a balanced confusion recognition dataset and a video localization dataset. We further provide zero-shot baseline evaluations of a representative open-source model and a proprietary model on clip-level confusion recognition, long-video confusion localization tasks. Experimental results show that the proprietary model performs better overall but tends to over-predict transitional segments, while the open-source model is more conservative and more prone to missed detections. In addition, the proposed student confusion report visualization can support educational experts in making intervention decisions and adapting learning plans accordingly. All datasets and related materials will be made publicly available on our project page.
Problem

Research questions and friction points this paper is trying to address.

confusion recognition
confusion localization
educational videos
noisy labels
temporal annotation
Innovation

Methods, ideas, or system contributions that make the work stand out.

confusion recognition
expert validation
temporal localization
educational video benchmark
multi-stage filtering
🔎 Similar Papers
No similar papers found.