MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs

📅 2025-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) lack rigorous evaluation of syntactic competence—particularly morphosyntactic judgment—in low-resource languages. Method: We introduce MorphoGram, the first multilingual syntactic benchmark covering 101 languages and six core grammatical phenomena, comprising 125,000 minimal pairs. Built upon Universal Dependencies and UniMorph, MorphoGram employs an automated, rule-driven morphological generation pipeline for scalable, cross-lingual construction. Contribution/Results: Our systematic evaluation reveals a pronounced performance drop in syntactic judgment for mainstream LLMs on low-resource languages, demonstrating strong resource dependency. MorphoGram fills a critical gap in fine-grained, multilingual grammatical assessment—spanning over one hundred languages—and establishes a reproducible, extensible evaluation paradigm with empirical grounding for low-resource language modeling.

Technology Category

Application Category

📝 Abstract
We introduce MultiBLiMP 1.0, a massively multilingual benchmark of linguistic minimal pairs, covering 101 languages, 6 linguistic phenomena and containing more than 125,000 minimal pairs. Our minimal pairs are created using a fully automated pipeline, leveraging the large-scale linguistic resources of Universal Dependencies and UniMorph. MultiBLiMP 1.0 evaluates abilities of LLMs at an unprecedented multilingual scale, and highlights the shortcomings of the current state-of-the-art in modelling low-resource languages.
Problem

Research questions and friction points this paper is trying to address.

Evaluates LLMs' abilities across 101 languages
Assesses 6 linguistic phenomena using minimal pairs
Highlights shortcomings in low-resource language modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated pipeline for minimal pairs creation
Leverages Universal Dependencies and UniMorph
Benchmark covering 101 languages
🔎 Similar Papers
No similar papers found.