🤖 AI Summary
This study investigates hallucination—specifically, the generation of non-existent lexical items—in large language models (LLMs) during Irish translation, examining its potential implicit influence on the lexical evolution of low-resource, highly inflectional languages. Method: We systematically compare translations produced by GPT-4.o and GPT-4.o Mini, employing linguistic annotation, morphological rule validation, and qualitative discourse analysis. Contribution/Results: We identify, for the first time, six systematic noun hallucination patterns and categorize verb-related hallucinations. Both models exhibit highly consistent hallucination typologies, yet GPT-4.o Mini generates them significantly more frequently. Crucially, most hallucinated forms superficially conform to Irish morphosyntactic constraints, suggesting LLMs may contribute to language evolution via a “synthetic fluency” mechanism—producing plausible but unattested forms. The study introduces a novel conceptual framework to assess LLM-induced linguistic impact, offering both methodological grounding and empirical evidence for evaluating LLM effects on endangered and low-resource languages.
📝 Abstract
This study examines hallucinations in Large Language Model (LLM) translations into Irish, specifically focusing on instances where the models generate novel, non-existent words. We classify these hallucinations within verb and noun categories, identifying six distinct patterns among the latter. Additionally, we analyse whether these hallucinations adhere to Irish morphological rules and what linguistic tendencies they exhibit. Our findings show that while both GPT-4.o and GPT-4.o Mini produce similar types of hallucinations, the Mini model generates them at a significantly higher frequency. Beyond classification, the discussion raises speculative questions about the implications of these hallucinations for the Irish language. Rather than seeking definitive answers, we offer food for thought regarding the increasing use of LLMs and their potential role in shaping Irish vocabulary and linguistic evolution. We aim to prompt discussion on how such technologies might influence language over time, particularly in the context of low-resource, morphologically rich languages.