Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis

📅 2025-11-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) exhibit poor confidence calibration in chain-of-thought (CoT) reasoning, frequently overconfidently assigning high confidence to incorrect predictions. Method: We propose EDTR, a novel decoding strategy that, for the first time, models CoT reasoning paths from a high-dimensional geometric perspective: CoT sequences are embedded as high-dimensional vectors; topological analysis extracts eight-dimensional risk features characterizing distributional cohesion and consistency; and Dirichlet-based uncertainty quantification enables multi-path uncertainty awareness. Contribution/Results: EDTR significantly improves calibration performance, achieving an average Expected Calibration Error (ECE) of 0.287 across four reasoning benchmarks (composite score: 0.672). On GSM8K, ECE drops to 0.107; on AIME, it attains perfect accuracy. Both calibration fidelity and generalization capability surpass state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Chain-of-thought (CoT) prompting enables Large Language Models to solve complex problems, but deploying these models safely requires reliable confidence estimates, a capability where existing methods suffer from poor calibration and severe overconfidence on incorrect predictions. We propose Enhanced Dirichlet and Topology Risk (EDTR), a novel decoding strategy that combines topological analysis with Dirichlet-based uncertainty quantification to measure LLM confidence across multiple reasoning paths. EDTR treats each CoT as a vector in high-dimensional space and extracts eight topological risk features capturing the geometric structure of reasoning distributions: tighter, more coherent clusters indicate higher confidence while dispersed, inconsistent paths signal uncertainty. We evaluate EDTR against three state-of-the-art calibration methods across four diverse reasoning benchmarks spanning olympiad-level mathematics (AIME), grade school math (GSM8K), commonsense reasoning, and stock price prediction cite{zhang2025aime, cobbe2021training, talmor-etal-2019-commonsenseqa, yahoo_finance}. EDTR achieves 41% better calibration than competing methods with an average ECE of 0.287 and the best overall composite score of 0.672, while notably achieving perfect accuracy on AIME and exceptional calibration on GSM8K with an ECE of 0.107, domains where baselines exhibit severe overconfidence. Our work provides a geometric framework for understanding and quantifying uncertainty in multi-step LLM reasoning, enabling more reliable deployment where calibrated confidence estimates are essential.
Problem

Research questions and friction points this paper is trying to address.

Improving confidence calibration for Chain-of-Thought reasoning in LLMs
Addressing overconfidence in incorrect predictions through geometric analysis
Quantifying uncertainty across multiple reasoning paths using topological features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining topological analysis with Dirichlet uncertainty quantification
Extracting topological risk features from reasoning path distributions
Achieving better calibration via geometric framework for uncertainty
🔎 Similar Papers
No similar papers found.
A
Abhishek More
Algoverse AI Research
A
Anthony Zhang
Algoverse AI Research
N
Nicole Bonilla
Algoverse AI Research
A
Ashvik Vivekan
Algoverse AI Research
Kevin Zhu
Kevin Zhu
PhD, Stanford University; Professor of Business+Technology, University of California, San Diego
ITdatae-commercesoftwaredigital transformation
P
Parham Sharafoleslami
Algoverse AI Research
Maheep Chaudhary
Maheep Chaudhary
Independent Research
Causal InferenceMachine Learning