Prompt and circumstance: A word-by-word LLM prompting approach to interlinear glossing for low-resource languages

📅 2025-02-13

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Manual interlinear glossed text (IGT) annotation for low-resource languages is costly and hampered by scarce linguistic tools. Method: We propose a retrieval-augmented, token-level prompting framework that integrates natural-language linguistic instructions into large language model (LLM) generation, enabling fine-grained word/morpheme alignment and dynamic grammatical inference. The approach supports both interactive assisted annotation and fully automatic IGT generation, featuring a novel instruction self-generation mechanism—driven by Tsez. Contribution/Results: Evaluated on the SIGMORPHON 2023 seven-language benchmark, our method achieves superior morpheme-level F1 over BERT baselines; outperforms the competition-winning sequence models in word-level accuracy across five languages; and significantly reduces grammatical feature confusion errors in Tsez. This work provides the first empirical validation that LLMs can simultaneously support human-in-the-loop collaboration and complex morphosyntactic reasoning—establishing an efficient, interpretable, and scalable paradigm for low-resource language documentation.

Technology Category

Application Category

📝 Abstract

Partly automated creation of interlinear glossed text (IGT) has the potential to assist in linguistic documentation. We argue that LLMs can make this process more accessible to linguists because of their capacity to follow natural-language instructions. We investigate the effectiveness of a retrieval-based LLM prompting approach to glossing, applied to the seven languages from the SIGMORPHON 2023 shared task. Our system beats the BERT-based shared task baseline for every language in the morpheme-level score category, and we show that a simple 3-best oracle has higher word-level scores than the challenge winner (a tuned sequence model) in five languages. In a case study on Tsez, we ask the LLM to automatically create and follow linguistic instructions, reducing errors on a confusing grammatical feature. Our results thus demonstrate the potential contributions which LLMs can make in interactive systems for glossing, both in making suggestions to human annotators and following directions.

Problem

Research questions and friction points this paper is trying to address.

Automate interlinear glossing for low-resource languages.

Improve linguistic documentation using LLM prompting.

Enhance accuracy in glossing with retrieval-based LLMs.

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM prompting for glossing

Retrieval-based LLM approach

Interactive linguistic instruction creation

🔎 Similar Papers

No similar papers found.