Investigating Hallucination in Conversations for Low Resource Languages

📅 2025-07-30

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Hallucination in large language models (LLMs) remains poorly understood across low-resource languages, particularly in conversational settings. Method: This study systematically investigates hallucination in Hindi, Persian, and Chinese dialogues using six state-of-the-art LLMs—GPT-3.5, GPT-4o, Llama-3.1, Gemma-2.0, DeepSeek-R1, and Qwen-3. We construct a multilingual dialogue dataset via hybrid human annotation and automated evaluation, quantifying hallucination rates along two orthogonal dimensions: factual consistency and linguistic accuracy. Contribution/Results: We uncover statistically significant cross-lingual variation: Chinese exhibits the lowest hallucination rate, while Hindi and Persian show markedly higher rates. This disparity reveals critical influences of training data abundance, morphological complexity, and corpus bias on generative reliability. Our findings establish a novel, interpretable benchmark for multilingual LLM trustworthiness assessment and advance mechanistic understanding of hallucination determinants in under-resourced languages.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated remarkable proficiency in generating text that closely resemble human writing. However, they often generate factually incorrect statements, a problem typically referred to as 'hallucination'. Addressing hallucination is crucial for enhancing the reliability and effectiveness of LLMs. While much research has focused on hallucinations in English, our study extends this investigation to conversational data in three languages: Hindi, Farsi, and Mandarin. We offer a comprehensive analysis of a dataset to examine both factual and linguistic errors in these languages for GPT-3.5, GPT-4o, Llama-3.1, Gemma-2.0, DeepSeek-R1 and Qwen-3. We found that LLMs produce very few hallucinated responses in Mandarin but generate a significantly higher number of hallucinations in Hindi and Farsi.

Problem

Research questions and friction points this paper is trying to address.

Analyzing hallucination in low-resource language conversations

Comparing factual errors across multiple LLMs in Hindi, Farsi, Mandarin

Identifying higher hallucination rates in Hindi/Farsi versus Mandarin

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing hallucinations in Hindi, Farsi, Mandarin

Testing multiple LLMs for factual errors

Comparing hallucination rates across languages

🔎 Similar Papers

Hallucination of Multimodal Large Language Models: A Survey