Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a confidence-aware adaptive reasoning framework to address the inefficiency of large language models in chain-of-thought reasoning, where they often generate unnecessarily lengthy reasoning paths, and multi-path self-consistency methods, while improving accuracy, incur substantial computational overhead. The proposed approach uniquely estimates uncertainty from sentence-level semantic and numerical features within a single reasoning trajectory, enabling dynamic decisions—without any fine-tuning—on whether to invoke costly multi-path reasoning. Trained solely on MedQA, the method demonstrates strong generalization across diverse benchmarks including MathQA, MedMCQA, and MMLU. It achieves accuracy comparable to multi-path baselines while reducing token consumption by up to 80%, substantially enhancing inference efficiency.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) achieve strong reasoning performance through chain-of-thought (CoT) reasoning, yet often generate unnecessarily long reasoning paths that incur high inference cost. Recent self-consistency-based approaches further improve accuracy but require sampling and aggregating multiple reasoning trajectories, leading to substantial additional computational overhead. This paper introduces a confidence-aware decision framework that analyzes a single completed reasoning trajectory to adaptively select between single-path and multi-path reasoning. The framework is trained using sentence-level numeric and linguistic features extracted from intermediate reasoning states in the MedQA dataset and generalizes effectively to MathQA, MedMCQA, and MMLU without additional fine-tuning. Experimental results show that the proposed method maintains accuracy comparable to multi-path baselines while using up to 80\% fewer tokens. These findings demonstrate that reasoning trajectories contain rich signals for uncertainty estimation, enabling a simple, transferable mechanism to balance accuracy and efficiency in LLM reasoning.
Problem

Research questions and friction points this paper is trying to address.

chain-of-thought reasoning
computational efficiency
self-consistency
large language models
inference cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

confidence-aware reasoning
self-consistency
chain-of-thought
adaptive inference
efficiency-accuracy trade-off
Juming Xiong
Juming Xiong
Vanderbilt University
deep learningcomputer visionmedical image processing
K
Kevin Guo
Vanderbilt University
C
Congning Ni
Vanderbilt University Medical Center
Chao Yan
Chao Yan
Instructor at DBMI, VUMC; CS PhD from Vanderbilt U
AI for medicineSynthetic health dataPrivacyFairness
K
Katherine Brown
Vanderbilt University Medical Center
A
Avinash Baidya
Intuit AI Research
Xiang Gao
Xiang Gao
Intuit
deep learning
B
Bradley Marlin
Vanderbilt University, Vanderbilt University Medical Center
Z
Zhijun Yin
Vanderbilt University, Vanderbilt University Medical Center