On Finding Inconsistencies in Documents

📅 2025-12-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of detecting latent inconsistencies—logical, factual, and numerical—in long documents from academic, legal, and financial domains. To this end, we introduce FIND, the first cross-domain expert-annotated benchmark for inconsistency detection. Methodologically, we propose a zero-shot and few-shot framework leveraging large language models (e.g., GPT-5) for inconsistency localization and explanation, validated through a multi-round expert verification protocol. Our contributions are threefold: (1) We establish the first high-quality, multi-domain benchmark for evaluating inconsistency detection; (2) We demonstrate that LMs can proactively uncover non-obvious contradictions overlooked by authors—identifying 136 previously undetected inconsistencies across 50 arXiv papers, with 69% adoption rate by domain experts—thereby expanding the trusted application scope of LMs in deep document auditing; (3) GPT-5 achieves 64% detection accuracy on injected inconsistencies in FIND, yet exhibits ~50% false negatives, highlighting concrete avenues for future improvement.

Technology Category

Application Category

📝 Abstract
Professionals in academia, law, and finance audit their documents because inconsistencies can result in monetary, reputational, and scientific costs. Language models (LMs) have the potential to dramatically speed up this auditing process. To understand their abilities, we introduce a benchmark, FIND (Finding INconsistencies in Documents), where each example is a document with an inconsistency inserted manually by a domain expert. Despite the documents being long, technical, and complex, the best-performing model (gpt-5) recovered 64% of the inserted inconsistencies. Surprisingly, gpt-5 also found undiscovered inconsistencies present in the original documents. For example, on 50 arXiv papers, we judged 136 out of 196 of the model's suggestions to be legitimate inconsistencies missed by the original authors. However, despite these findings, even the best models miss almost half of the inconsistencies in FIND, demonstrating that inconsistency detection is still a challenging task.
Problem

Research questions and friction points this paper is trying to address.

Detects inconsistencies in long, technical documents
Evaluates language models' ability to find document errors
Addresses challenges in automated inconsistency detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark FIND evaluates inconsistency detection in documents
GPT-5 identifies both inserted and original document inconsistencies
Model detects legitimate inconsistencies missed by human authors
🔎 Similar Papers
No similar papers found.