🤖 AI Summary
This work investigates how negation contexts impair large language models’ (LLMs) ability to detect hallucinations, revealing that current models perform significantly worse on negated inputs than on affirmative ones—often producing logically inconsistent or unfaithful judgments due to flawed negation comprehension. To address this, we introduce NegHalu, the first benchmark specifically designed for hallucination detection under negation. It is constructed by systematically rephrasing samples from existing datasets to incorporate negation and augmented with token-level internal state analysis to trace models’ reasoning dynamics during negation processing. Our study uncovers a fundamental structural deficiency in LLMs’ modeling of negation logic and answers three core questions regarding how negation interferes with hallucination identification. Experiments show an average 23.7% performance drop across mainstream models on NegHalu, highlighting critical gaps in negation understanding. This work establishes a new benchmark and provides theoretical insights for trustworthy AI and robust logical reasoning research.
📝 Abstract
Recent studies on hallucination in large language models (LLMs) have been actively progressing in natural language processing. However, the impact of negated text on hallucination with LLMs remains largely unexplored. In this paper, we set three important yet unanswered research questions and aim to address them. To derive the answers, we investigate whether LLMs can recognize contextual shifts caused by negation and still reliably distinguish hallucinations comparable to affirmative cases. We also design the NegHalu dataset by reconstructing existing hallucination detection datasets with negated expressions. Our experiments demonstrate that LLMs struggle to detect hallucinations in negated text effectively, often producing logically inconsistent or unfaithful judgments. Moreover, we trace the internal state of LLMs as they process negated inputs at the token level and reveal the challenges of mitigating their unintended effects.