Early Stopping Chain-of-thoughts in Large Language Models

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

To address the high computational overhead in chain-of-thought (CoT) reasoning with large language models (LLMs) caused by excessively long reasoning chains, this paper proposes ES-CoT: a training-free, runtime-length-statistics-based answer convergence detection mechanism. ES-CoT leverages prompt engineering to elicit stepwise answers and dynamically monitors the frequency of consecutively repeated answers during inference; a sudden increase in run length serves as a novel, empirically grounded indicator of answer convergence, triggering early termination. Evaluated across three mainstream LLMs and five reasoning benchmarks, ES-CoT reduces token consumption by 41% on average while preserving accuracy comparable to standard CoT. Moreover, it natively supports advanced inference strategies such as self-consistency, enhancing both efficiency and practical deployability without architectural or training modifications.

Technology Category

Application Category

📝 Abstract

Reasoning large language models (LLMs) have demonstrated superior capacities in solving complicated problems by generating long chain-of-thoughts (CoT), but such a lengthy CoT incurs high inference costs. In this study, we introduce ES-CoT, an inference-time method that shortens CoT generation by detecting answer convergence and stopping early with minimal performance loss. At the end of each reasoning step, we prompt the LLM to output its current final answer, denoted as a step answer. We then track the run length of consecutive identical step answers as a measure of answer convergence. Once the run length exhibits a sharp increase and exceeds a minimum threshold, the generation is terminated. We provide both empirical and theoretical support for this heuristic: step answers steadily converge to the final answer, and large run-length jumps reliably mark this convergence. Experiments on five reasoning datasets across three LLMs show that ES-CoT reduces the number of inference tokens by about 41% on average while maintaining accuracy comparable to standard CoT. Further, ES-CoT integrates seamlessly with self-consistency prompting and remains robust across hyperparameter choices, highlighting it as a practical and effective approach for efficient reasoning.

Problem

Research questions and friction points this paper is trying to address.

Reducing lengthy chain-of-thought inference costs

Detecting answer convergence during reasoning steps

Maintaining accuracy while shortening CoT generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Early stopping based on answer convergence

Tracking run length of identical answers

Reducing inference tokens with minimal accuracy loss

🔎 Similar Papers

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency