LaoBench: A Large-Scale Multidimensional Lao Benchmark for Large Language Models

📅 2025-11-14

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the absence of large-scale, multidimensional evaluation benchmarks for low-resource Lao, this paper introduces LaoBench—the first comprehensive benchmark tailored for the Lao language. It spans three core dimensions: knowledge application, foundational education, and trilingual (Lao–Chinese–English) translation, comprising over 17,000 high-quality, expert-annotated, culturally adapted, and pedagogically valuable samples. Methodologically, it pioneers an integrated data pipeline combining expert-led annotation with agent-assisted validation, and implements a dual-track evaluation framework—supporting both open- and closed-source model assessment—to balance fairness and data security. Empirical evaluation across mainstream large language models reveals substantial performance degradation on Lao tasks, confirming LaoBench’s rigor and utility. This work establishes a critical infrastructure for AI evaluation of low-resource Southeast Asian languages.

Technology Category

Application Category

📝 Abstract

The rapid advancement of large language models (LLMs) has not been matched by their evaluation in low-resource languages, especially Southeast Asian languages like Lao. To fill this gap, we introduce LaoBench, the first large-scale, high-quality, and multidimensional benchmark dataset dedicated to assessing LLMs'comprehensive language understanding and reasoning abilities in Lao. LaoBench comprises over 17,000 carefully curated samples spanning three core dimensions: knowledge application, K12 foundational education, and bilingual translation among Lao, Chinese, and English. The dataset is divided into open-source and closed-source subsets, with the closed-source portion enabling black-box evaluation on an official platform to ensure fairness and data security. Our data construction pipeline integrates expert human curation with automated agent-assisted verification, ensuring linguistic accuracy, cultural relevance, and educational value. Benchmarking multiple state-of-the-art LLMs on LaoBench reveals that current models still face significant challenges in mastering Lao across diverse tasks. We hope LaoBench will catalyze further research and development of AI technologies for underrepresented Southeast Asian languages.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs in low-resource Lao language capabilities

Assessing comprehensive understanding across knowledge, education, translation

Addressing performance gaps in underrepresented Southeast Asian languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multidimensional benchmark dataset for Lao language evaluation

Human curation combined with automated agent-assisted verification

Closed-source subset enables black-box evaluation platform

🔎 Similar Papers

No similar papers found.