Investigating Symbolic Triggers of Hallucination in Gemma Models Across HaluEval and TruthfulQA

📅 2025-09-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates how symbolic inputs—such as modifiers and named entities—systematically induce hallucinations in large language models (LLMs). Method: Leveraging the HaluEval and TruthfulQA benchmarks, we propose a question-format transformation technique to conduct cross-scale hallucination analysis across the Gemma-2 family (2B, 9B, and 27B parameters). Contribution/Results: We identify and quantitatively characterize symbolic hallucination triggers for the first time: such inputs induce hallucination rates exceeding 84% across all model scales—significantly higher than the overall average hallucination rates (2B: 79.0% → 27B: 63.9%). This reveals a deep, scale-invariant deficiency in LLMs’ symbolic semantic parsing capabilities. Our findings provide critical empirical evidence and interpretable insights into the intrinsic mechanisms underlying hallucinations, thereby informing the development of more robust evaluation frameworks and targeted model improvements.

Technology Category

Application Category

📝 Abstract

Hallucination in Large Language Models (LLMs) is a well studied problem. However, the properties that make LLM intrinsically vulnerable to hallucinations have not been identified and studied. This research identifies and characterizes the key properties, allowing us to pinpoint vulnerabilities within the model's internal mechanisms. To solidify on these properties, we utilized two established datasets, HaluEval and TruthfulQA and convert their existing format of question answering into various other formats to narrow down these properties as the reason for the hallucinations. Our findings reveal that hallucination percentages across symbolic properties are notably high for Gemma-2-2B, averaging 79.0% across tasks and datasets. With increased model scale, hallucination drops to 73.6% for Gemma-2-9B and 63.9% for Gemma-2-27B, reflecting a 15 percentage point reduction overall. Although the hallucination rate decreases as the model size increases, a substantial amount of hallucination caused by symbolic properties still persists. This is especially evident for modifiers (ranging from 84.76% to 94.98%) and named entities (ranging from 83.87% to 93.96%) across all Gemma models and both datasets. These findings indicate that symbolic elements continue to confuse the models, pointing to a fundamental weakness in how these LLMs process such inputs--regardless of their scale.

Problem

Research questions and friction points this paper is trying to address.

Identifying symbolic triggers causing hallucinations in Gemma models

Investigating model vulnerability to modifiers and named entities

Analyzing hallucination persistence across model scales and datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Identified symbolic triggers causing model hallucinations

Used HaluEval and TruthfulQA datasets format conversion

Analyzed hallucination patterns across different model scales

🔎 Similar Papers

No similar papers found.

Authors to Follow