CaseFacts: A Benchmark for Legal Fact-Checking and Precedent Retrieval

📅 2026-01-23

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

Existing automated fact-checking methods struggle to meet the dynamic, technically demanding, and high-stakes verification requirements in the legal domain, particularly due to the significant semantic gap between layperson assertions and authoritative judicial precedents. To address this challenge, this work introduces CaseFacts, the first legal fact-checking benchmark tailored to U.S. Supreme Court rulings, comprising 6,294 annotated claims labeled as supported, refuted, or overturned, with explicit emphasis on temporal validity and semantic alignment. We propose a multi-stage pipeline that leverages large language models (LLMs) for claim generation and employs a semantic similarity–based heuristic to efficiently identify complex overruling relationships among cases. Experimental results demonstrate that state-of-the-art LLMs exhibit limited performance on this task, and surprisingly, open-web retrieval degrades accuracy, underscoring both the unique challenges of legal fact-checking and the value of the proposed benchmark.

Technology Category

Application Category

📝 Abstract

Automated Fact-Checking has largely focused on verifying general knowledge against static corpora, overlooking high-stakes domains like law where truth is evolving and technically complex. We introduce CaseFacts, a benchmark for verifying colloquial legal claims against U.S. Supreme Court precedents. Unlike existing resources that map formal texts to formal texts, CaseFacts challenges systems to bridge the semantic gap between layperson assertions and technical jurisprudence while accounting for temporal validity. The dataset consists of 6,294 claims categorized as Supported, Refuted, or Overruled. We construct this benchmark using a multi-stage pipeline that leverages Large Language Models (LLMs) to synthesize claims from expert case summaries, employing a novel semantic similarity heuristic to efficiently identify and verify complex legal overrulings. Experiments with state-of-the-art LLMs reveal that the task remains challenging; notably, augmenting models with unrestricted web search degrades performance compared to closed-book baselines due to the retrieval of noisy, non-authoritative precedents. We release CaseFacts to spur research into legal fact verification systems.

Problem

Research questions and friction points this paper is trying to address.

legal fact-checking

precedent retrieval

semantic gap

temporal validity

colloquial legal claims

Innovation

Methods, ideas, or system contributions that make the work stand out.

legal fact-checking

precedent retrieval

semantic gap