Evaluating Multimodal Generative AI with Korean Educational Standards

📅 2025-02-21

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This study addresses the lack of evaluation benchmarks for multimodal generative AI in low-resource language educational contexts—specifically Korean. To this end, we introduce KoNET, the first comprehensive Korean national education examination benchmark spanning elementary school through university. Methodologically, KoNET is systematically constructed based on Korea’s national curriculum standards and integrates four categories of standardized exam items. We propose a structured parsing pipeline, a cross-level knowledge coverage assessment, and a unified evaluation framework supporting open-source, closed-source, and API-based models, augmented by human error rate comparison. Key contributions include: (1) the first multimodal, multi-stage, multidisciplinary, and high-difficulty Korean education benchmark aligned with East Asian pedagogical systems; (2) empirical identification of capability gaps and subject-specific biases in state-of-the-art models for Korean educational reasoning; and (3) full open-sourcing of data, code, and construction tools—filling a critical gap in non-English educational AI evaluation and advancing AI research for low-resource language education.

Technology Category

Application Category

📝 Abstract

This paper presents the Korean National Educational Test Benchmark (KoNET), a new benchmark designed to evaluate Multimodal Generative AI Systems using Korean national educational tests. KoNET comprises four exams: the Korean Elementary General Educational Development Test (KoEGED), Middle (KoMGED), High (KoHGED), and College Scholastic Ability Test (KoCSAT). These exams are renowned for their rigorous standards and diverse questions, facilitating a comprehensive analysis of AI performance across different educational levels. By focusing on Korean, KoNET provides insights into model performance in less-explored languages. We assess a range of models - open-source, open-access, and closed APIs - by examining difficulties, subject diversity, and human error rates. The code and dataset builder will be made fully open-sourced at https://github.com/naver-ai/KoNET.

Problem

Research questions and friction points this paper is trying to address.

Benchmark for Multimodal AI evaluation

Korean educational standards integration

Comprehensive analysis across educational levels

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Generative AI evaluation

Korean educational test benchmark

Open-source dataset and code

🔎 Similar Papers

Beyond Text-to-Text: An Overview of Multimodal and Generative Artificial Intelligence for Education Using Topic Modeling

2024-09-24arXiv.orgCitations: 2