LingGym: How Far Are LLMs from Thinking Like Field Linguists?

📅 2025-10-31

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work investigates the capacity of large language models (LLMs) to perform meta-linguistic reasoning—such as part-of-speech identification and syntactic structure inference—in low-resource and unseen languages, emulating field linguists’ analytical processes. To this end, we introduce LingGym, a novel benchmark comprising 18 typologically diverse reference grammar corpora, and propose the Word-Gloss Inference task—the first to systematically evaluate cross-level inference among morphology, part-of-speech tags, and contextual usage. Methodologically, we integrate structured linguistic cues—including interlinear glossed text (IGT), grammatical explanations, and translations—and employ controlled variable experiments. Results demonstrate that incorporating such structured cues yields significant and consistent performance gains across models, revealing both the promise of LLMs for linguistic typology analysis and endangered language documentation, as well as persistent generalization bottlenecks in zero-shot cross-lingual meta-linguistic reasoning.

Technology Category

Application Category

📝 Abstract

This paper introduces LingGym, a new benchmark that evaluates LLMs' capacity for meta-linguistic reasoning using Interlinear Glossed Text (IGT) and grammatical descriptions extracted from 18 typologically diverse reference grammars. Unlike previous work that focuses on specific downstream tasks, we assess whether LLMs can generalize linguistic inference across low-resource languages and structures not seen during training. We present a controlled evaluation task: Word-Gloss Inference, in which the model must infer a missing word and gloss from context using varying levels of linguistic information (e.g., glosses, grammatical explanations, translations). Our results show that incorporating structured linguistic cues leads to consistent improvements in reasoning performance across all models. This work highlights both the promise and current limitations of using LLMs for typologically informed linguistic analysis and low-resource language documentation.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' meta-linguistic reasoning using diverse language grammars

Assessing generalization of linguistic inference across low-resource languages

Testing word-gloss inference with structured linguistic cues and explanations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark evaluates meta-linguistic reasoning with Interlinear Glossed Text

Assesses generalization across low-resource languages and structures

Incorporates structured linguistic cues to improve reasoning performance

🔎 Similar Papers

Large Language Models Meet NLP: A Survey