TurBLiMP: A Turkish Benchmark of Linguistic Minimal Pairs

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

Turkish lacks a dedicated syntactic evaluation benchmark addressing its word-order flexibility and morphological dependency. Method: We introduce TurBLiMP—the first minimal-pair benchmark for Turkish—comprising 16 syntactic phenomena and 16,000 minimal pairs. Our approach systematically covers core grammatical features, incorporates human acceptability judgments as a behavioral baseline, and integrates cross-model consistency analysis with statistical hypothesis testing. Contribution/Results: TurBLiMP enables the first fine-grained, controlled assessment of Turkish syntactic competence in language models. We find that state-of-the-art LMs underperform humans substantially on verb inflection and subordinate clause identification—exhibiting error rates 30–50% higher. These results expose critical deficiencies in modeling morphological complexity and word-order sensitivity. TurBLiMP thus provides a standardized, linguistically grounded evaluation tool for assessing Turkish syntactic capabilities in monolingual and multilingual models.

Technology Category

Application Category

📝 Abstract

We introduce TurBLiMP, the first Turkish benchmark of linguistic minimal pairs, designed to evaluate the linguistic abilities of monolingual and multilingual language models (LMs). Covering 16 linguistic phenomena with 1000 minimal pairs each, TurBLiMP fills an important gap in linguistic evaluation resources for Turkish. In designing the benchmark, we give extra attention to two properties of Turkish that remain understudied in current syntactic evaluations of LMs, namely word order flexibility and subordination through morphological processes. Our experiments on a wide range of LMs and a newly collected set of human acceptability judgments reveal that even cutting-edge Large LMs still struggle with grammatical phenomena that are not challenging for humans, and may also exhibit different sensitivities to word order and morphological complexity compared to humans.

Problem

Research questions and friction points this paper is trying to address.

Evaluates linguistic abilities of monolingual and multilingual language models

Addresses gap in Turkish linguistic evaluation resources

Examines word order flexibility and morphological subordination challenges

Innovation

Methods, ideas, or system contributions that make the work stand out.

Turkish benchmark with 1000 minimal pairs

Focus on word order flexibility

Evaluate morphological subordination complexity

🔎 Similar Papers

No similar papers found.