Conformal Language Model Reasoning with Coherent Factuality

📅 2025-05-21
🏛️ International Conference on Learning Representations
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient factual grounding of language models in logical reasoning, this work introduces *coherent factuality*—a notion requiring joint verification of reasoning steps under their premise-dependency structure, rather than isolated statement evaluation. Methodologically, we construct a *deducibility graph* to explicitly model structural dependencies within reasoning chains and apply *split conformal prediction* to calibrate subgraphs, enabling chain-level confidence control. This constitutes the first framework that formally defines and guarantees coherent factuality, moving beyond conventional assertion-level evaluation paradigms. On the MATH and FELM benchmarks, our approach achieves 90% coherent factuality coverage while preserving over 80% of original claims, substantially improving both correctness and verifiability of reasoning outputs.

Technology Category

Application Category

📝 Abstract
Language models are increasingly being used in important decision pipelines, so ensuring the correctness of their outputs is crucial. Recent work has proposed evaluating the"factuality"of claims decomposed from a language model generation and applying conformal prediction techniques to filter out those claims that are not factual. This can be effective for tasks such as information retrieval, where constituent claims may be evaluated in isolation for factuality, but is not appropriate for reasoning tasks, as steps of a logical argument can be evaluated for correctness only within the context of the claims that precede them. To capture this, we define"coherent factuality"and develop a conformal-prediction-based method to guarantee coherent factuality for language model outputs. Our approach applies split conformal prediction to subgraphs within a"deducibility"graph"that represents the steps of a reasoning problem. We evaluate our method on mathematical reasoning problems from the MATH and FELM datasets and find that our algorithm consistently produces correct and substantiated orderings of claims, achieving coherent factuality across target coverage levels. Moreover, we achieve 90% factuality on our stricter definition while retaining 80% or more of the original claims, highlighting the utility of our deducibility-graph-guided approach.
Problem

Research questions and friction points this paper is trying to address.

Ensuring correctness of language model outputs in decision pipelines
Evaluating coherent factuality for logical reasoning tasks
Guaranteeing substantiated claim orderings in reasoning problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conformal prediction for coherent factuality
Deducibility graph guides reasoning steps
Split conformal prediction on subgraphs
🔎 Similar Papers
No similar papers found.