π€ AI Summary
This work addresses a critical limitation of large language models in long-horizon reasoning: the accumulation of irrecoverable errors due to extreme atomic decomposition, particularly when mistakes concentrate at a few pivotal steps and resist correction. To tackle this, the authors propose Lookahead-Enhanced Atomic Decomposition (LEAD), a novel approach that identifies and mitigates reasoning bottlenecks caused by non-uniform error distributions. LEAD integrates short-horizon lookahead verification with overlapping trajectory aggregation, preserving local context while maintaining decomposition stability to enable effective error correction. Experimental results demonstrate that LEAD empowers the o4-mini model to successfully solve Checkers Jumping problems up to complexity level n=13, substantially outperforming existing extreme decomposition methods, which are limited to n=11.
π Abstract
Long-horizon execution in Large Language Models (LLMs) remains unstable even when high-level strategies are provided. Evaluating on controlled algorithmic puzzles, we demonstrate that while decomposition is essential for stability, extreme decomposition creates a"no-recovery bottleneck". We show that this bottleneck becomes critical due to highly non-uniform error distribution, where consistent errors on a few"hard"steps become irreversible. To address this, we propose Lookahead-Enhanced Atomic Decomposition (LEAD). By incorporating short-horizon future validation and aggregating overlapping rollouts, LEAD provides enough isolation to maintain stability while retaining enough local context to correct errors. This enables the o4-mini model to solve Checkers Jumping up to complexity $n=13$, whereas extreme decomposition fails beyond $n=11$.