FactReasoner: A Probabilistic Approach to Long-Form Factuality Assessment for Large Language Models

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) often generate factually unreliable content, limiting their deployment in high-stakes, precision-critical applications. To address this, we propose FactReasoner—a probabilistic reasoning framework for factual assessment of long-form generated text. It decomposes model outputs into atomic propositions, retrieves external evidence via knowledge retrieval, leverages pretrained language models to identify semantic entailment or contradiction between each proposition and its supporting context, and constructs a joint probabilistic graphical model to infer proposition-level factual support scores via Bayesian posterior inference. Crucially, FactReasoner is the first method to explicitly encode logical semantic relations as differentiable probability distributions, enabling end-to-end, interpretable fact verification—moving beyond heuristic prompting and binary classification paradigms. Experiments demonstrate that FactReasoner significantly outperforms state-of-the-art prompting-based approaches on both supervised and unsupervised benchmarks, achieving simultaneous gains in factual precision and recall, thereby validating the efficacy and generalizability of probabilistic modeling for factual evaluation.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have demonstrated vast capabilities on generative tasks in recent years, yet they struggle with guaranteeing the factual correctness of the generated content. This makes these models unreliable in realistic situations where factually accurate responses are expected. In this paper, we propose FactReasoner, a new factuality assessor that relies on probabilistic reasoning to assess the factuality of a long-form generated response. Specifically, FactReasoner decomposes the response into atomic units, retrieves relevant contexts for them from an external knowledge source, and constructs a joint probability distribution over the atoms and contexts using probabilistic encodings of the logical relationships (entailment, contradiction) between the textual utterances corresponding to the atoms and contexts. FactReasoner then computes the posterior probability of whether atomic units in the response are supported by the retrieved contexts. Our experiments on labeled and unlabeled benchmark datasets demonstrate clearly that FactReasoner improves considerably over state-of-the-art prompt-based approaches in terms of both factual precision and recall.
Problem

Research questions and friction points this paper is trying to address.

Assessing factual accuracy in LLM-generated content
Decomposing long-form responses for probabilistic evaluation
Improving precision and recall in factuality assessments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic reasoning for factuality
Decomposition into atomic units
Joint probability distribution construction
🔎 Similar Papers
No similar papers found.
Radu Marinescu
Radu Marinescu
IBM Research
Artificial Intelligence
Debarun Bhattacharjya
Debarun Bhattacharjya
Researcher, IBM T.J. Watson Research Center
artificial intelligencedecision analysismachine learningprobabilistic modeling
Junkyu Lee
Junkyu Lee
IBM
Artificial IntelligenceGraphical ModelsHeuristic SearchPlanning
T
Tigran Tchrakian
IBM Research
J
Javier Carnerero Cano
IBM Research
Y
Yufang Hou
IBM Research, IT:U - Interdisciplinary Transformation University Austria
E
Elizabeth Daly
IBM Research
A
Alessandra Pascale
IBM Research