Detecting LLM Fact-conflicting Hallucinations Enhanced by Temporal-logic-based Reasoning

📅 2025-02-19

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Large language models (LLMs) frequently generate factually conflicting hallucinations (FCHs) in temporal reasoning tasks. To address this, we propose Drowzee, an end-to-end metamorphic testing framework. Its core innovations include: (i) the first integration of temporal logic (TL) into hallucination detection, enabling automatic construction of temporally sensitive, ground-truth–annotated test cases from sources such as Wikipedia; (ii) a semantic-aware dual verifier that concurrently checks logical consistency between model outputs and intermediate reasoning steps; and (iii) a fully automated, annotation-free evaluation pipeline leveraging knowledge graph structuring, templated prompting, and dual semantic alignment via embeddings and dependency structures. Experiments across nine LLMs and nine knowledge domains demonstrate that Drowzee detects non-temporal hallucinations at rates of 24.7%–59.8% and temporal hallucinations at 16.7%–39.2%, significantly outperforming existing baselines.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) face the challenge of hallucinations -- outputs that seem coherent but are actually incorrect. A particularly damaging type is fact-conflicting hallucination (FCH), where generated content contradicts established facts. Addressing FCH presents three main challenges: 1) Automatically constructing and maintaining large-scale benchmark datasets is difficult and resource-intensive; 2) Generating complex and efficient test cases that the LLM has not been trained on -- especially those involving intricate temporal features -- is challenging, yet crucial for eliciting hallucinations; and 3) Validating the reasoning behind LLM outputs is inherently difficult, particularly with complex logical relationships, as it requires transparency in the model's decision-making process. This paper presents Drowzee, an innovative end-to-end metamorphic testing framework that utilizes temporal logic to identify fact-conflicting hallucinations (FCH) in large language models (LLMs). Drowzee builds a comprehensive factual knowledge base by crawling sources like Wikipedia and uses automated temporal-logic reasoning to convert this knowledge into a large, extensible set of test cases with ground truth answers. LLMs are tested using these cases through template-based prompts, which require them to generate both answers and reasoning steps. To validate the reasoning, we propose two semantic-aware oracles that compare the semantic structure of LLM outputs to the ground truths. Across nine LLMs in nine different knowledge domains, experimental results show that Drowzee effectively identifies rates of non-temporal-related hallucinations ranging from 24.7% to 59.8%, and rates of temporal-related hallucinations ranging from 16.7% to 39.2%.

Problem

Research questions and friction points this paper is trying to address.

Detecting fact-conflicting hallucinations in LLMs

Automating temporal-logic-based test case generation

Validating LLM reasoning with semantic-aware oracles

Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal-logic-based reasoning framework

Automated test case generation

Semantic-aware validation oracles

🔎 Similar Papers

AutoHall: Automated Hallucination Dataset Generation for Large Language Models