From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Korean large language model (LLM) evaluation benchmarks suffer from insufficient domain specificity and limited real-world applicability. Method: This work introduces two expert-level Korean benchmarks—KMMLU-Redux, grounded in Korea’s National Technical Qualification Examinations, and KMMLU-Pro, the first benchmark incorporating national professional licensure examinations. We systematically reconstruct and refine the original KMMLU dataset through rigorous human verification, error correction, and data cleaning, establishing a standardized, multi-domain framework for assessing specialized knowledge. Contribution/Results: The benchmarks significantly enhance fidelity in evaluating LLMs’ domain knowledge retention and application in industrial contexts. Empirical results demonstrate that they provide a more comprehensive and reliable assessment of LLM performance across Korean professional domains. Both benchmarks are publicly released to foster community advancement in Korean-language LLM evaluation.

Technology Category

Application Category

📝 Abstract
The development of Large Language Models (LLMs) requires robust benchmarks that encompass not only academic domains but also industrial fields to effectively evaluate their applicability in real-world scenarios. In this paper, we introduce two Korean expert-level benchmarks. KMMLU-Redux, reconstructed from the existing KMMLU, consists of questions from the Korean National Technical Qualification exams, with critical errors removed to enhance reliability. KMMLU-Pro is based on Korean National Professional Licensure exams to reflect professional knowledge in Korea. Our experiments demonstrate that these benchmarks comprehensively represent industrial knowledge in Korea. We release our dataset publicly available.
Problem

Research questions and friction points this paper is trying to address.

Develop robust Korean benchmarks for LLM evaluation
Enhance reliability by removing critical errors in KMMLU-Redux
Reflect professional knowledge via KMMLU-Pro licensure exams
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconstructs KMMLU-Redux from existing benchmark
Develops KMMLU-Pro based on licensure exams
Ensures benchmarks reflect industrial knowledge
🔎 Similar Papers
No similar papers found.