ConfusionBench: An Expert-Validated Benchmark for Confusion Recognition and Localization in Educational Videos

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

Existing datasets for detecting student confusion in educational videos often suffer from label noise, coarse temporal annotations, and a lack of expert validation, hindering fine-grained recognition and precise temporal localization. To address these limitations, this work proposes a multi-stage data curation pipeline that integrates model-assisted filtering, manual verification, and expert validation to construct ConfusionBench—a high-quality benchmark comprising a balanced classification dataset and a finely annotated video localization dataset. Notably, this is the first effort to incorporate expert validation into the annotation process, significantly enhancing label reliability. The benchmark also includes zero-shot evaluation protocols and visualization tools for both proprietary and open-source models. Experimental results reveal that proprietary models achieve superior overall performance but tend to over-predict transitional segments, whereas open-source models are more conservative yet exhibit higher miss rates. The dataset and tools will be publicly released to advance research in educational AI.

Technology Category

Application Category

📝 Abstract

Recognizing and localizing student confusion from video is an important yet challenging problem in educational AI. Existing confusion datasets suffer from noisy labels, coarse temporal annotations, and limited expert validation, which hinder reliable fine-grained recognition and temporally grounded analysis. To address these limitations, we propose a practical multi-stage filtering pipeline that integrates two stages of model-assisted screening, researcher curation, and expert validation to build a higher-quality benchmark for confusion understanding. Based on this pipeline, we introduce ConfusionBench, a new benchmark for educational videos consisting of a balanced confusion recognition dataset and a video localization dataset. We further provide zero-shot baseline evaluations of a representative open-source model and a proprietary model on clip-level confusion recognition, long-video confusion localization tasks. Experimental results show that the proprietary model performs better overall but tends to over-predict transitional segments, while the open-source model is more conservative and more prone to missed detections. In addition, the proposed student confusion report visualization can support educational experts in making intervention decisions and adapting learning plans accordingly. All datasets and related materials will be made publicly available on our project page.

Problem

Research questions and friction points this paper is trying to address.

confusion recognition

confusion localization

educational videos

noisy labels

temporal annotation

Innovation

Methods, ideas, or system contributions that make the work stand out.

confusion recognition

expert validation

temporal localization