🤖 AI Summary
Current LLM factuality evaluation suffers from the absence of standardized benchmarks and comparable methodologies, hindering systematic progress. To address this, we propose OpenFactCheck—the first open-source, scalable, and reproducible end-to-end fact-checking framework. Methodologically, it establishes an integrated ecosystem comprising: (i) customizable checker development (CUSTCHECKER), (ii) a cross-model fair evaluation protocol (LLMEVAL), and (iii) human-annotated quantification of checker reliability (CHECKEREVAL); introduces multi-granularity metrics and a human-in-the-loop verification paradigm; and releases standardized benchmark datasets and tooling. Empirically, OpenFactCheck significantly improves the verifiability of LLM outputs, enhances comparability and reliability across diverse fact-checking systems, and provides a unified infrastructure for factuality assessment of open-domain free-text claims.
📝 Abstract
The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. Difficulties lie in assessing the factuality of free-form responses in open domains. Also, different papers use disparate evaluation benchmarks and measurements, which renders them hard to compare and hampers future progress. To mitigate these issues, we propose OpenFactCheck, a unified framework for building customized automatic fact-checking systems, benchmarking their accuracy, evaluating factuality of LLMs, and verifying claims in a document. OpenFactCheck consists of three modules: (i) CUSTCHECKER allows users to easily customize an automatic fact-checker and verify the factual correctness of documents and claims, (ii) LLMEVAL, a unified evaluation framework assesses LLM's factuality ability from various perspectives fairly, and (iii) CHECKEREVAL is an extensible solution for gauging the reliability of automatic fact-checkers' verification results using human-annotated datasets. Data and code are publicly available at https://github.com/yuxiaw/openfactcheck.