MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs

📅 2025-04-03

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing large language models (LLMs) lack rigorous evaluation of syntactic competence—particularly morphosyntactic judgment—in low-resource languages. Method: We introduce MorphoGram, the first multilingual syntactic benchmark covering 101 languages and six core grammatical phenomena, comprising 125,000 minimal pairs. Built upon Universal Dependencies and UniMorph, MorphoGram employs an automated, rule-driven morphological generation pipeline for scalable, cross-lingual construction. Contribution/Results: Our systematic evaluation reveals a pronounced performance drop in syntactic judgment for mainstream LLMs on low-resource languages, demonstrating strong resource dependency. MorphoGram fills a critical gap in fine-grained, multilingual grammatical assessment—spanning over one hundred languages—and establishes a reproducible, extensible evaluation paradigm with empirical grounding for low-resource language modeling.

Technology Category

Application Category

📝 Abstract

We introduce MultiBLiMP 1.0, a massively multilingual benchmark of linguistic minimal pairs, covering 101 languages, 6 linguistic phenomena and containing more than 125,000 minimal pairs. Our minimal pairs are created using a fully automated pipeline, leveraging the large-scale linguistic resources of Universal Dependencies and UniMorph. MultiBLiMP 1.0 evaluates abilities of LLMs at an unprecedented multilingual scale, and highlights the shortcomings of the current state-of-the-art in modelling low-resource languages.

Problem

Research questions and friction points this paper is trying to address.

Evaluates LLMs' abilities across 101 languages

Assesses 6 linguistic phenomena using minimal pairs

Highlights shortcomings in low-resource language modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated pipeline for minimal pairs creation

Leverages Universal Dependencies and UniMorph

Benchmark covering 101 languages

🔎 Similar Papers

No similar papers found.