Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis

📅 2025-11-09

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing large language models (LLMs) exhibit poor confidence calibration in chain-of-thought (CoT) reasoning, frequently overconfidently assigning high confidence to incorrect predictions. Method: We propose EDTR, a novel decoding strategy that, for the first time, models CoT reasoning paths from a high-dimensional geometric perspective: CoT sequences are embedded as high-dimensional vectors; topological analysis extracts eight-dimensional risk features characterizing distributional cohesion and consistency; and Dirichlet-based uncertainty quantification enables multi-path uncertainty awareness. Contribution/Results: EDTR significantly improves calibration performance, achieving an average Expected Calibration Error (ECE) of 0.287 across four reasoning benchmarks (composite score: 0.672). On GSM8K, ECE drops to 0.107; on AIME, it attains perfect accuracy. Both calibration fidelity and generalization capability surpass state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Chain-of-thought (CoT) prompting enables Large Language Models to solve complex problems, but deploying these models safely requires reliable confidence estimates, a capability where existing methods suffer from poor calibration and severe overconfidence on incorrect predictions. We propose Enhanced Dirichlet and Topology Risk (EDTR), a novel decoding strategy that combines topological analysis with Dirichlet-based uncertainty quantification to measure LLM confidence across multiple reasoning paths. EDTR treats each CoT as a vector in high-dimensional space and extracts eight topological risk features capturing the geometric structure of reasoning distributions: tighter, more coherent clusters indicate higher confidence while dispersed, inconsistent paths signal uncertainty. We evaluate EDTR against three state-of-the-art calibration methods across four diverse reasoning benchmarks spanning olympiad-level mathematics (AIME), grade school math (GSM8K), commonsense reasoning, and stock price prediction cite{zhang2025aime, cobbe2021training, talmor-etal-2019-commonsenseqa, yahoo_finance}. EDTR achieves 41% better calibration than competing methods with an average ECE of 0.287 and the best overall composite score of 0.672, while notably achieving perfect accuracy on AIME and exceptional calibration on GSM8K with an ECE of 0.107, domains where baselines exhibit severe overconfidence. Our work provides a geometric framework for understanding and quantifying uncertainty in multi-step LLM reasoning, enabling more reliable deployment where calibrated confidence estimates are essential.

Problem

Research questions and friction points this paper is trying to address.

Improving confidence calibration for Chain-of-Thought reasoning in LLMs

Addressing overconfidence in incorrect predictions through geometric analysis

Quantifying uncertainty across multiple reasoning paths using topological features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining topological analysis with Dirichlet uncertainty quantification

Extracting topological risk features from reasoning path distributions

Achieving better calibration via geometric framework for uncertainty

🔎 Similar Papers

No similar papers found.