ConSol: Sequential Probability Ratio Testing to Find Consistent LLM Reasoning Paths Efficiently

📅 2025-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) employ self-consistency to improve reasoning accuracy, but this approach requires extensive sampling, incurring high computational cost and poor token efficiency. To address this, we propose a dynamic termination mechanism grounded in the Sequential Probability Ratio Test (SPRT), the first adaptation of SPRT to assess consistency among LLM reasoning paths. Our method leverages statistical significance to determine in real time whether the modal answer has converged, thereby eliminating redundant sampling. By jointly modeling reasoning paths and customizing sensitivity calibration, our approach maintains accuracy comparable to standard self-consistency while reducing sampling requirements by 40–64×. This yields substantial gains in token efficiency without compromising reliability. The implementation, including code and datasets, is publicly available and supports installation via pip.

Technology Category

Application Category

📝 Abstract
Recent advancements in large language models (LLMs) integrating explicit reasoning, such as OpenAI's o3-mini, DeepSeek-R1, and QWQ-32B, enable smaller models to solve complex tasks by generating intermediate reasoning steps prior to providing answers. However, this approach significantly increases computational costs, both monetarily and environmentally. The widely-used self-consistency method further exacerbates these costs by aggregating multiple reasoning paths to improve accuracy, often requiring between 40 to 64 samples per task. Although aggregation effectively reduces variance and bias, additional sampling can lead to diminishing returns when early samples yield consistent results. To address inefficiencies, we propose leveraging Sequential Probability Ratio Testing (SPRT) to dynamically terminate sampling once sufficient consistency is achieved. We calibrate SPRT parameters specifically for LLM applications, accounting for sensitivity to detect the mode of the distribution. Our experiments demonstrate that incorporating SPRT significantly enhances token efficiency, achieving comparable accuracy to self-consistency methods but at a substantially reduced computational cost. To promote transparency and facilitate reproducibility, we have made the source code and datasets used in our experiments publicly available at our GitHub repository: https://github.com/LiuzLab/consol, or available as a PyPI package: pip install consol. We hope that this resource will support further research and encourage the development of new methods building upon our work.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational costs in LLM reasoning paths
Improving efficiency in self-consistency sampling methods
Dynamic termination of sampling using SPRT
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Sequential Probability Ratio Testing (SPRT)
Dynamically terminates sampling for efficiency
Calibrates SPRT for LLM sensitivity and mode detection
🔎 Similar Papers
No similar papers found.