Targeted Syntactic Evaluation of Language Models on Georgian Case Alignment

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This study evaluates the ability of language models to handle nominative, ergative, and dative case marking within the split-ergative system of Georgian—a low-resource language with semantically complex ergative alignment. Leveraging a treebank and the Grew query language, the authors construct a fine-grained evaluation set comprising 370 minimal pairs across seven syntactic tasks (50–70 instances each). They systematically test five encoder and two decoder models, establishing the first syntactic benchmark for Georgian and proposing a methodology generalizable to other low-resource languages. The high-quality test set is publicly released. Results reveal that model performance strongly correlates with case form frequency (NOM > DAT > ERG), with ergative marking consistently the weakest, highlighting the joint impact of data scarcity and the semantic intricacies of ergativity on model accuracy.

Technology Category

Application Category

📝 Abstract

This paper evaluates the performance of transformer-based language models on split-ergative case alignment in Georgian, a particularly rare system for assigning grammatical cases to mark argument roles. We focus on subject and object marking determined through various permutations of nominative, ergative, and dative noun forms. A treebank-based approach for the generation of minimal pairs using the Grew query language is implemented. We create a dataset of 370 syntactic tests made up of seven tasks containing 50-70 samples each, where three noun forms are tested in any given sample. Five encoder- and two decoder-only models are evaluated with word- and/or sentence-level accuracy metrics. Regardless of the specific syntactic makeup, models performed worst in assigning the ergative case correctly and strongest in assigning the nominative case correctly. Performance correlated with the overall frequency distribution of the three forms (NOM>DAT>ERG). Though data scarcity is a known issue for low-resource languages, we show that the highly specific role of the ergative along with a lack of available training data likely contributes to poor performance on this case. The dataset is made publicly available and the methodology provides an interesting avenue for future syntactic evaluations of languages where benchmarks are limited.

Problem

Research questions and friction points this paper is trying to address.

case alignment

ergativity

Georgian

syntactic evaluation

language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

split-ergativity

minimal pairs

treebank-based evaluation