🤖 AI Summary
Existing research lacks a formal framework for characterizing the complexity of intricate computational reasoning tasks—such as Boolean satisfiability, the 24 Game, and automated planning—leading to suboptimal performance of large language models (LLMs) on these problems. To address this, we propose the Predicate-Enumeration-Aggregation (PEA) framework, which decomposes reasoning into three structured phases: predicate modeling, combinatorial enumeration, and result aggregation. PEA leverages LLMs to synthesize executable programs that interface with external solvers for verification-driven execution. This work establishes the first structured, formal system for computational reasoning, explicitly modeling logical constraints and search spaces—thereby transcending the limitations of purely textual inference. Experiments across diverse benchmarks demonstrate an average accuracy improvement of ~50%, alongside significant reductions in error rate, redundant reasoning steps, and solving latency. The framework achieves both high reliability and strong interpretability.
📝 Abstract
Large Language Models (LLMs) have exhibited remarkable capabilities across diverse domains, prompting investigations into their potential as generic reasoning engines. While recent studies have explored inference-time computation to enhance model performance on complex problems, current research lacks a formal framework to characterize the complexity of reasoning tasks. This study introduces the Predicate-Enumeration-Aggregation (PEA) framework, a formal approach to describe and solve a class of important reasoning tasks termed computational reasoning problems. The PEA framework decomposes these problems into predicate and enumeration components, using LLMs to synthesize programs based on specified predicates, enumeration, and aggregation rules. These synthesized programs are then executed to obtain solutions to the computational tasks. We demonstrate the framework's efficacy on benchmark tasks including Boolean satisfiability problems, game of $24$, and planning problems. Empirical evaluation reveals that PEA substantially enhances the performance of underlying models on benchmark computational problems, yielding an average accuracy improvement of approximately $50%$, coupled with increased efficiency.