🤖 AI Summary
Existing confidence estimation methods for large language models (LLMs) are primarily designed for factual question answering and struggle to generalize to reasoning tasks. To address this, we propose a training-free, graph-structured approach for quantifying confidence in LLM-generated reasoning paths: each path is modeled as a directed graph, and confidence is computed by jointly leveraging node centrality, path convergence, and edge-weighting strategies. This work is the first to explicitly incorporate graph structure into confidence modeling for LLM reasoning paths, offering both task-agnostic applicability and inherent interpretability. Extensive experiments across two state-of-the-art LLMs and three diverse reasoning benchmarks demonstrate substantial improvements in calibration accuracy and cross-task generalization. Moreover, the method consistently enhances performance in downstream applications—including adaptive reasoning termination and answer re-ranking—validating its practical utility and robustness.
📝 Abstract
Confidence estimation is essential for the reliable deployment of large language models (LLMs). Existing methods are primarily designed for factual QA tasks and often fail to generalize to reasoning tasks. To address this gap, we propose a set of training-free, graph-based confidence estimation methods tailored to reasoning tasks. Our approach models reasoning paths as directed graphs and estimates confidence by exploiting graph properties such as centrality, path convergence, and path weighting. Experiments with two LLMs on three reasoning datasets demonstrate improved confidence estimation and enhanced performance on two downstream tasks.