Towards Student Actions in Classroom Scenes: New Dataset and Baseline

📅 2024-09-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Fine-grained student behavior analysis in educational settings is hindered by the absence of realistic, multi-label action datasets captured in authentic classroom environments. To address this gap, we introduce SAV—the first large-scale, multi-label student action video dataset curated from real classrooms—comprising 4,324 annotated video clips spanning 15 distinct action classes and explicitly capturing challenging conditions including small objects, high subject density, and severe occlusion. We further propose an education-optimized visual Transformer baseline that integrates fine-grained local attention with spatiotemporal modeling to effectively resolve subtle action discrimination and dense interaction recognition. Evaluated on SAV, our model achieves a mean average precision (mAP) of 67.9%, substantially outperforming existing methods. Both the dataset and source code are publicly released to foster reproducible research in educational behavioral analytics.

Technology Category

Application Category

📝 Abstract

Analyzing student actions is an important and challenging task in educational research. Existing efforts have been hampered by the lack of accessible datasets to capture the nuanced action dynamics in classrooms. In this paper, we present a new multi-label Student Action Video (SAV) dataset, specifically designed for action detection in classroom settings. The SAV dataset consists of 4,324 carefully trimmed video clips from 758 different classrooms, annotated with 15 distinct student actions. Compared to existing action detection datasets, the SAV dataset stands out by providing a wide range of real classroom scenarios, high-quality video data, and unique challenges, including subtle movement differences, dense object engagement, significant scale differences, varied shooting angles, and visual occlusion. These complexities introduce new opportunities and challenges to advance action detection methods. To benchmark this, we propose a novel baseline method based on a visual transformer, designed to enhance attention to key local details within small and dense object regions. Our method demonstrates excellent performance with a mean Average Precision (mAP) of 67.9% and 27.4% on the SAV and AVA datasets, respectively. This paper not only provides the dataset but also calls for further research into AI-driven educational tools that may transform teaching methodologies and learning outcomes. The code and dataset are released at https://github.com/Ritatanz/SAV.

Problem

Research questions and friction points this paper is trying to address.

Lack of accessible datasets for student action analysis.

Challenges in detecting nuanced actions in classroom settings.

Need for advanced methods to improve action detection accuracy.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces multi-label Student Action Video dataset

Proposes visual transformer for action detection

Achieves high mAP on SAV and AVA datasets

🔎 Similar Papers

No similar papers found.