Do Large Language Models Grasp The Grammar? Evidence from Grammar-Book-Guided Probing in Luxembourgish

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This study addresses the lack of systematic frameworks for evaluating large language models’ (LLMs) grammatical competence in low-resource languages, using Luxembourgish as a case study to propose the first grammar-book-guided, four-stage evaluation framework. Methodologically, it integrates probe-based analysis, minimal-pair testing, grammar-driven task design, and semantic–syntactic decoupled assessment to examine morphological, syntactic, and semantic–syntactic mapping capabilities at multiple levels. Key contributions include: (1) systematically incorporating pedagogical grammar resources into low-resource language evaluation; (2) revealing only weak correlation between LLMs’ translation performance and grammatical understanding; and (3) demonstrating that models rely heavily on semantic inference to mask syntactic deficiencies—evidenced by notably poor performance on morphological inflection and minimal-pair tasks—indicating insufficient robustness in their syntactic representations.

Technology Category

Application Category

📝 Abstract

Grammar refers to the system of rules that governs the structural organization and the semantic relations among linguistic units such as sentences, phrases, and words within a given language. In natural language processing, there remains a notable scarcity of grammar focused evaluation protocols, a gap that is even more pronounced for low-resource languages. Moreover, the extent to which large language models genuinely comprehend grammatical structure, especially the mapping between syntactic structures and meanings, remains under debate. To investigate this issue, we propose a Grammar Book Guided evaluation pipeline intended to provide a systematic and generalizable framework for grammar evaluation consisting of four key stages, and in this work we take Luxembourgish as a case study. The results show a weak positive correlation between translation performance and grammatical understanding, indicating that strong translations do not necessarily imply deep grammatical competence. Larger models perform well overall due to their semantic strength but remain weak in morphology and syntax, struggling particularly with Minimal Pair tasks, while strong reasoning ability offers a promising way to enhance their grammatical understanding.

Problem

Research questions and friction points this paper is trying to address.

Evaluating grammatical understanding in large language models

Addressing grammar evaluation gaps in low-resource languages

Investigating syntax-semantics mapping competence through systematic probing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Grammar Book Guided evaluation pipeline for grammar assessment

Systematic framework with four stages for low-resource languages

Probing models via Minimal Pair tasks and reasoning enhancement

🔎 Similar Papers

Large Language Models Meet NLP: A Survey