🤖 AI Summary
To address the pervasive hallucination problem in large language model (LLM) outputs and the lack of standardized, open-domain factual evaluation protocols, this paper introduces FACTEVAL—the first open-source, end-to-end unified factual evaluation framework. FACTEVAL proposes a novel tri-module collaborative architecture—RESPONSEEVAL, LLMEVAL, and CHECKEREVAL—that jointly assesses factual correctness at the response, model, and verifier levels, respectively, unifying interfaces, metrics, and benchmarking procedures. Implemented in Python, it integrates claim extraction, evidence retrieval, and claim verification, enabling customizable verification pipelines. It is publicly available as a PyPI package, on GitHub, and via a web service. Extensive evaluation across multiple benchmarks demonstrates significant improvements in assessment consistency and reproducibility. FACTEVAL has been widely adopted in the LLM safety and trustworthiness research community.
📝 Abstract
The increased use of large language models (LLMs) across a variety of real-world applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucinate. This is difficult as it requires assessing the factuality of free-form open-domain responses. While there has been a lot of research on this topic, different papers use different evaluation benchmarks and measures,which makes them hard to compare and hampers future progress. To mitigate these issues, we developed OpenFactCheck, a unified framework, with three modules: (i) RESPONSEEVAL, which allows users to easily customize an automatic fact-checking system and to assess the factuality of all claims in an input document using that system, (ii) LLMEVAL, which assesses the overall factuality of an LLM, and (iii) CHECKEREVAL, a module to evaluate automatic fact-checking systems. OpenFactCheck is open-sourced (https://github.com/mbzuai-nlp/openfactcheck) and publicly released as a Python library (https://pypi.org/project/openfactcheck/) and also as a web service (http://app.openfactcheck.com). A video describing the system is available at https://youtu.be/-i9VKL0HleI.