🤖 AI Summary
Multilingual large language models (LLMs) exhibit pervasive factual hallucinations in geopolitically sensitive, data-scarce regions—exemplified by North Korea—where linguistic and informational asymmetries exacerbate reliability deficits.
Method: We systematically evaluate hallucination rates across English, Korean, and Chinese interfaces of leading LLMs (e.g., LLaMA, Qwen, Gemini) using cross-lingual consistency analysis, controlled prompt engineering, and dual-track fact-checking (human-in-the-loop + rule-based verification).
Contribution/Results: We introduce the first multilingual hallucination assessment framework tailored to extreme geopolitical scenarios, transcending monolingual or single-source validation paradigms. Results reveal up to 47% disparity in hallucination rates for identical events across language interfaces, with pronounced divergence in political narrative framing—demonstrating that language choice itself constitutes an implicit bias channel. This work provides a quantifiable risk calibration methodology and theoretical foundation for trustworthy LLM deployment in high-stakes domains.
📝 Abstract
Hallucination in large language models (LLMs) remains a significant challenge for their safe deployment, particularly due to its potential to spread misinformation. Most existing solutions address this challenge by focusing on aligning the models with credible sources or by improving how models communicate their confidence (or lack thereof) in their outputs. While these measures may be effective in most contexts, they may fall short in scenarios requiring more nuanced approaches, especially in situations where access to accurate data is limited or determining credible sources is challenging. In this study, we take North Korea - a country characterised by an extreme lack of reliable sources and the prevalence of sensationalist falsehoods - as a case study. We explore and evaluate how some of the best-performing multilingual LLMs and specific language-based models generate information about North Korea in three languages spoken in countries with significant geo-political interests: English (United States, United Kingdom), Korean (South Korea), and Mandarin Chinese (China). Our findings reveal significant differences, suggesting that the choice of model and language can lead to vastly different understandings of North Korea, which has important implications given the global security challenges the country poses.