FaStFACT: Faster, Stronger Long-Form Factuality Evaluations in LLMs

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing long-text factuality assessment methods rely on claim decomposition and evidence retrieval but suffer from pipeline complexity, low efficiency, inaccurate claim extraction, and fragmented evidence. This paper proposes FACTBLOCK, a block-level claim extraction and document-level evidence joint verification framework. It introduces a confidence-based pre-filtering mechanism to reduce futile retrievals and integrates selective web crawling with cross-paragraph evidence aggregation, significantly improving evidential sufficiency and alignment with human judgments. Evaluated on multiple human-annotated benchmarks, FACTBLOCK substantially outperforms state-of-the-art methods in both assessment accuracy (human alignment) and reasoning/search efficiency. To our knowledge, it is the first approach to achieve synergistic optimization of high accuracy and high efficiency in long-text factuality assessment.

Technology Category

Application Category

📝 Abstract
Evaluating the factuality of long-form generations from Large Language Models (LLMs) remains challenging due to accuracy issues and costly human assessment. Prior efforts attempt this by decomposing text into claims, searching for evidence, and verifying claims, but suffer from critical drawbacks: (1) inefficiency due to complex pipeline components unsuitable for long LLM outputs, and (2) ineffectiveness stemming from inaccurate claim sets and insufficient evidence collection of one-line snippets. To address these limitations, we propose ame, a fast and strong evaluation framework that achieves the highest alignment with human evaluation and efficiency among existing baselines. ame first employs chunk-level claim extraction integrated with confidence-based pre-verification, significantly reducing the cost of web searching and inference calling while ensuring reliability. For searching and verification, it collects document-level evidence from crawled webpages and selectively retrieves it during verification, addressing the evidence insufficiency problem in previous pipelines. Extensive experiments based on an aggregated and manually annotated benchmark demonstrate the reliability of ame in both efficiently and effectively evaluating the factuality of long-form LLM generations. Code and benchmark data is available at https://github.com/Yingjia-Wan/FastFact.
Problem

Research questions and friction points this paper is trying to address.

Evaluating long-form LLM factuality with accuracy and efficiency challenges
Overcoming inefficient claim decomposition and evidence collection pipelines
Addressing insufficient evidence and high cost of human assessments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chunk-level claim extraction with pre-verification
Document-level evidence collection from webpages
Selective retrieval during verification process