OpenFactCheck: Building, Benchmarking Customized Fact-Checking Systems and Evaluating the Factuality of Claims and LLMs

📅 2024-05-09

📈 Citations: 7

✨ Influential: 1

career value

177K/year

🤖 AI Summary

Current LLM factuality evaluation suffers from the absence of standardized benchmarks and comparable methodologies, hindering systematic progress. To address this, we propose OpenFactCheck—the first open-source, scalable, and reproducible end-to-end fact-checking framework. Methodologically, it establishes an integrated ecosystem comprising: (i) customizable checker development (CUSTCHECKER), (ii) a cross-model fair evaluation protocol (LLMEVAL), and (iii) human-annotated quantification of checker reliability (CHECKEREVAL); introduces multi-granularity metrics and a human-in-the-loop verification paradigm; and releases standardized benchmark datasets and tooling. Empirically, OpenFactCheck significantly improves the verifiability of LLM outputs, enhances comparability and reliability across diverse fact-checking systems, and provides a unified infrastructure for factuality assessment of open-domain free-text claims.

Technology Category

Application Category

📝 Abstract

The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. Difficulties lie in assessing the factuality of free-form responses in open domains. Also, different papers use disparate evaluation benchmarks and measurements, which renders them hard to compare and hampers future progress. To mitigate these issues, we propose OpenFactCheck, a unified framework for building customized automatic fact-checking systems, benchmarking their accuracy, evaluating factuality of LLMs, and verifying claims in a document. OpenFactCheck consists of three modules: (i) CUSTCHECKER allows users to easily customize an automatic fact-checker and verify the factual correctness of documents and claims, (ii) LLMEVAL, a unified evaluation framework assesses LLM's factuality ability from various perspectives fairly, and (iii) CHECKEREVAL is an extensible solution for gauging the reliability of automatic fact-checkers' verification results using human-annotated datasets. Data and code are publicly available at https://github.com/yuxiaw/openfactcheck.

Problem

Research questions and friction points this paper is trying to address.

Building customizable systems to verify factual accuracy of LLM outputs

Benchmarking fact-checking methods with unified evaluation frameworks

Assessing factuality of free-form claims across diverse domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Customizable automatic fact-checker for document verification

Unified framework for evaluating LLM factuality from multiple perspectives

Extensible solution for assessing fact-checker reliability with human data

🔎 Similar Papers

OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs