🤖 AI Summary
A lack of benchmark datasets hinders visual object detection research in educational videos. Method: This paper introduces LVVO, the first large-scale, education-specific visual object detection benchmark, comprising 245 lecture videos from biology, computer science, and geoscience, with 4,000 annotated frames. Annotations are fine-grained across four object types: tables, charts, photographs, and diagrams. We formally define canonical visual object categories in educational contexts and employ a dual-annotator + expert arbitration protocol, achieving 83.41% inter-annotator F1 agreement. A novel semi-supervised strategy—integrating confidence-based filtering and model self-training—is proposed to expand the dataset into the high-quality LVVO_3k subset (3,000 frames). Contribution/Results: We publicly release LVVO_1k (1,000 human-verified frames) and LVVO_3k, establishing the first dedicated benchmark for educational video understanding and enabling rigorous development and evaluation of both supervised and semi-supervised detection methods.
📝 Abstract
We introduce the Lecture Video Visual Objects (LVVO) dataset, a new benchmark for visual object detection in educational video content. The dataset consists of 4,000 frames extracted from 245 lecture videos spanning biology, computer science, and geosciences. A subset of 1,000 frames, referred to as LVVO_1k, has been manually annotated with bounding boxes for four visual categories: Table, Chart-Graph, Photographic-image, and Visual-illustration. Each frame was labeled independently by two annotators, resulting in an inter-annotator F1 score of 83.41%, indicating strong agreement. To ensure high-quality consensus annotations, a third expert reviewed and resolved all cases of disagreement through a conflict resolution process. To expand the dataset, a semi-supervised approach was employed to automatically annotate the remaining 3,000 frames, forming LVVO_3k. The complete dataset offers a valuable resource for developing and evaluating both supervised and semi-supervised methods for visual content detection in educational videos. The LVVO dataset is publicly available to support further research in this domain.