MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Large language models (LLMs) exhibit severely degraded performance on China’s minority languages—Tibetan, Uyghur, Kazakh, and Mongolian—due to their low-resource status, multigraphic nature (e.g., Tibetan script, Arabic-based Uyghur/Kazakh, Cyrillic/Mongolian script), and rich morphological complexity. Method: This paper introduces MiLiC-Eval, the first systematic, multi-task evaluation benchmark tailored for these languages. It encompasses nine language understanding and reasoning tasks, with 24K human-verified samples, enabling unified, fine-grained assessment across all four languages and their non-Latin scripts. We propose a novel evaluation framework emphasizing grammatical sensitivity and cross-script capability. Results: Experiments reveal that state-of-the-art multilingual LLMs achieve sub-40% average accuracy on grammar-intensive and multi-script tasks. MiLiC-Eval provides standardized prompt templates, Unicode-compatible automated scoring, and reproducible diagnostic tools—establishing a foundational benchmark for low-resource language model adaptation and capability advancement.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) excel in high-resource languages but struggle with low-resource languages (LRLs), particularly those spoken by minority communities in China, such as Tibetan, Uyghur, Kazakh, and Mongolian. To systematically track the progress in these languages, we introduce MiLiC-Eval, a benchmark designed for minority languages in China, featuring 24K instances across 9 tasks. MiLiC-Eval focuses on underrepresented writing systems and provides a fine-grained assessment of linguistic and problem-solving skills. Our evaluation reveals that LLMs perform poorly on syntax-intensive tasks and multi-script languages. We further demonstrate how MiLiC-Eval can help advance LRL research in handling diverse writing systems and understanding the process of language adaptation.

Problem

Research questions and friction points this paper is trying to address.

Benchmarking LLMs for China's minority languages

Assessing performance on underrepresented writing systems

Improving language adaptation for low-resource languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

MiLiC-Eval benchmarks minority languages in China

Focuses on underrepresented writing systems

Assesses syntax-intensive and multi-script tasks

🔎 Similar Papers

How Chinese are Chinese Language Models? The Puzzling Lack of Language Policy in China's LLMs