🤖 AI Summary
This work addresses the tendency of large language models to generate redundant reasoning when faced with ambiguous or uncertain queries. To mitigate this, the authors propose a statistically grounded early-stopping mechanism that models the arrival time of uncertainty-indicating keywords through a parameterized update process. By integrating sequential hypothesis testing with nonparametric methods, the approach yields an adaptive termination strategy with finite-sample guarantees. Notably, this is the first method to incorporate sequential testing into parameterized early-stopping design, effectively preventing premature termination on well-defined problems while substantially enhancing reasoning efficiency and reliability across multiple domains—particularly excelling in mathematical reasoning tasks.
📝 Abstract
While LLMs have seen substantial improvement in reasoning capabilities, they also sometimes overthink, generating unnecessary reasoning steps, particularly under uncertainty, given ill-posed or ambiguous queries. We introduce statistically principled early stopping methods that monitor uncertainty signals during generation to mitigate this issue. Our first approach is parametric: it models inter-arrival times of uncertainty keywords as a renewal process and applies sequential testing for stopping. Our second approach is nonparametric and provides finite-sample guarantees on the probability of halting too early on well-posed queries. We conduct empirical evaluations on reasoning tasks across several domains and models. Our results indicate that uncertainty-aware early stopping can improve both efficiency and reliability in LLM reasoning, and we observe especially significant gains for math reasoning.