JointCQ: Improving Factual Hallucination Detection with Joint Claim and Query Generation

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently generate factually inconsistent outputs—so-called hallucinations. Existing hallucination detection methods suffer from two critical limitations: (1) loss of contextual information during claim extraction, and (2) insufficient specificity in verification query generation. To address these issues, we propose the first end-to-end joint modeling framework that simultaneously optimizes claim extraction and verification query generation. Our approach employs fine-grained context-aware modeling to mitigate information decay and introduces a training strategy based on controllable synthetic data filtering to enhance query discriminability. Evaluated on multiple open-domain question-answering hallucination detection benchmarks—including HOVER and FEVER-Sym—our method achieves significant improvements over state-of-the-art approaches. Downstream retrieval-based verification accuracy increases by 4.2–7.8 percentage points on average, demonstrating enhanced robustness and interpretability in hallucination detection.

Technology Category

Application Category

📝 Abstract
Current large language models (LLMs) often suffer from hallucination issues, i,e, generating content that appears factual but is actually unreliable. A typical hallucination detection pipeline involves response decomposition (i.e., claim extraction), query generation, evidence collection (i.e., search or retrieval), and claim verification. However, existing methods exhibit limitations in the first two stages, such as context loss during claim extraction and low specificity in query generation, resulting in degraded performance across the hallucination detection pipeline. In this work, we introduce JointCQ https://github.com/pku0xff/JointCQ, a joint claim-and-query generation framework designed to construct an effective and efficient claim-query generator. Our framework leverages elaborately designed evaluation criteria to filter synthesized training data, and finetunes a language model for joint claim extraction and query generation, providing reliable and informative inputs for downstream search and verification. Experimental results demonstrate that our method outperforms previous methods on multiple open-domain QA hallucination detection benchmarks, advancing the goal of more trustworthy and transparent language model systems.
Problem

Research questions and friction points this paper is trying to address.

Detecting factual hallucinations in large language model outputs
Improving claim extraction and query generation for verification
Enhancing reliability and transparency of language model systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint claim and query generation framework
Filtered training data with evaluation criteria
Finetuned model for improved hallucination detection
🔎 Similar Papers
No similar papers found.
F
Fan Xu
Wangxuan Institute of Computer Technology, Peking University
Huixuan Zhang
Huixuan Zhang
Peking University
Natural Language Processing
Z
Zhenliang Zhang
Wangxuan Institute of Computer Technology, Peking University
J
Jiahao Wang
Trustworthy Technology and Engineering Laboratory, Huawei
Xiaojun Wan
Xiaojun Wan
Peking University
Natural Language ProcessingText MiningArtificial Intelligence