LaoBench: A Large-Scale Multidimensional Lao Benchmark for Large Language Models

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the absence of large-scale, multidimensional evaluation benchmarks for low-resource Lao, this paper introduces LaoBench—the first comprehensive benchmark tailored for the Lao language. It spans three core dimensions: knowledge application, foundational education, and trilingual (Lao–Chinese–English) translation, comprising over 17,000 high-quality, expert-annotated, culturally adapted, and pedagogically valuable samples. Methodologically, it pioneers an integrated data pipeline combining expert-led annotation with agent-assisted validation, and implements a dual-track evaluation framework—supporting both open- and closed-source model assessment—to balance fairness and data security. Empirical evaluation across mainstream large language models reveals substantial performance degradation on Lao tasks, confirming LaoBench’s rigor and utility. This work establishes a critical infrastructure for AI evaluation of low-resource Southeast Asian languages.

Technology Category

Application Category

📝 Abstract
The rapid advancement of large language models (LLMs) has not been matched by their evaluation in low-resource languages, especially Southeast Asian languages like Lao. To fill this gap, we introduce LaoBench, the first large-scale, high-quality, and multidimensional benchmark dataset dedicated to assessing LLMs'comprehensive language understanding and reasoning abilities in Lao. LaoBench comprises over 17,000 carefully curated samples spanning three core dimensions: knowledge application, K12 foundational education, and bilingual translation among Lao, Chinese, and English. The dataset is divided into open-source and closed-source subsets, with the closed-source portion enabling black-box evaluation on an official platform to ensure fairness and data security. Our data construction pipeline integrates expert human curation with automated agent-assisted verification, ensuring linguistic accuracy, cultural relevance, and educational value. Benchmarking multiple state-of-the-art LLMs on LaoBench reveals that current models still face significant challenges in mastering Lao across diverse tasks. We hope LaoBench will catalyze further research and development of AI technologies for underrepresented Southeast Asian languages.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs in low-resource Lao language capabilities
Assessing comprehensive understanding across knowledge, education, translation
Addressing performance gaps in underrepresented Southeast Asian languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multidimensional benchmark dataset for Lao language evaluation
Human curation combined with automated agent-assisted verification
Closed-source subset enables black-box evaluation platform
🔎 Similar Papers
No similar papers found.
J
Jian Gao
China-ASEAN Information Harbor Co., Ltd., Nanning, China
R
Richeng Xuan
Beijing Academy of Artificial Intelligence, Beijing, China
Z
Zhaolu Kang
Beijing Academy of Artificial Intelligence, Beijing, China; School of Software & Microelectronics, Peking University, Beijing, China
D
Dingshi Liao
China-ASEAN Information Harbor Co., Ltd., Nanning, China
W
Wenxin Huang
China-ASEAN Information Harbor Co., Ltd., Nanning, China
Z
Zongmou Huang
China-ASEAN Information Harbor Co., Ltd., Nanning, China
Y
Yangdi Xu
China-ASEAN Information Harbor Co., Ltd., Nanning, China
Bowen Qin
Bowen Qin
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Zheqi He
Zheqi He
Beijing Academy of Artificial Intelligence
Computer visionLLM
X
Xi Yang
Beijing Academy of Artificial Intelligence, Beijing, China
C
Changjin Li
China-ASEAN Information Harbor Co., Ltd., Nanning, China