π€ AI Summary
This work addresses the performance degradation of large language models in long-horizon reasoning tasks when increasing test-time compute budgets, a phenomenon often caused by over-reasoning under static planning paradigms such as chain-of-thought. The authors propose the βbounded reasoning spaceβ hypothesis, positing that models possess an intrinsic reasoning boundary beyond which redundant planning impairs performance. To operationalize this insight, they formulate LLM reasoning for the first time as a non-autonomous stochastic dynamical system and introduce the concept of an optimal compute budget interval. They design Halo, a dynamic control framework integrating model predictive control with an entropy-driven dual-controller mechanism to enable measure-based replanning. Experiments demonstrate that Halo significantly outperforms static baselines on complex long-horizon tasks, validating the efficacy and superiority of dynamically regulating reasoning within its intrinsic boundary.
π Abstract
The test-time compute strategy, such as Chain-of-Thought (CoT), has significantly enhanced the ability of large language models to solve complex tasks like logical reasoning. However, empirical studies indicate that simply increasing the compute budget can sometimes lead to a collapse in test-time performance when employing typical task decomposition strategies such as CoT. This work hypothesizes that reasoning failures with larger compute budgets stem from static planning methods, which hardly perceive the intrinsic boundaries of LLM reasoning. We term it as the Limited Reasoning Space hypothesis and perform theoretical analysis through the lens of a non-autonomous stochastic dynamical system. This insight suggests that there is an optimal range for compute budgets; over-planning can lead to redundant feedback and may even impair reasoning capabilities. To exploit the compute-scaling benefits and suppress over-planning, this work proposes Halo, a model predictive control framework for LLM planning. Halo is designed for long-horizon tasks with reason-based planning and crafts an entropy-driven dual controller, which adopts a Measure-then-Plan strategy to achieve controllable reasoning. Experimental results demonstrate that Halo outperforms static baselines on complex long-horizon tasks by dynamically regulating planning at the reasoning boundary.