The Impact of Negated Text on Hallucination with Large Language Models

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work investigates how negation contexts impair large language models’ (LLMs) ability to detect hallucinations, revealing that current models perform significantly worse on negated inputs than on affirmative ones—often producing logically inconsistent or unfaithful judgments due to flawed negation comprehension. To address this, we introduce NegHalu, the first benchmark specifically designed for hallucination detection under negation. It is constructed by systematically rephrasing samples from existing datasets to incorporate negation and augmented with token-level internal state analysis to trace models’ reasoning dynamics during negation processing. Our study uncovers a fundamental structural deficiency in LLMs’ modeling of negation logic and answers three core questions regarding how negation interferes with hallucination identification. Experiments show an average 23.7% performance drop across mainstream models on NegHalu, highlighting critical gaps in negation understanding. This work establishes a new benchmark and provides theoretical insights for trustworthy AI and robust logical reasoning research.

Technology Category

Application Category

📝 Abstract

Recent studies on hallucination in large language models (LLMs) have been actively progressing in natural language processing. However, the impact of negated text on hallucination with LLMs remains largely unexplored. In this paper, we set three important yet unanswered research questions and aim to address them. To derive the answers, we investigate whether LLMs can recognize contextual shifts caused by negation and still reliably distinguish hallucinations comparable to affirmative cases. We also design the NegHalu dataset by reconstructing existing hallucination detection datasets with negated expressions. Our experiments demonstrate that LLMs struggle to detect hallucinations in negated text effectively, often producing logically inconsistent or unfaithful judgments. Moreover, we trace the internal state of LLMs as they process negated inputs at the token level and reveal the challenges of mitigating their unintended effects.

Problem

Research questions and friction points this paper is trying to address.

Investigating LLMs' ability to detect hallucinations in negated text

Examining contextual shift recognition and reliability under negation

Addressing challenges in mitigating unintended effects of negation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Designed NegHalu dataset with negated expressions

Traced internal LLM states during negation processing

Evaluated hallucination detection in negated contexts

🔎 Similar Papers

No similar papers found.