🤖 AI Summary
Current large language models lack amendable reasoning representations for mathematical problem solving, often constrained by rigid inference pipelines or unreliable self-evaluation mechanisms and susceptible to interference from program context. This work proposes Iterative Improvement of Program Construction (IIPC), a novel approach that deeply integrates executable feedback with Chain-of-Thought reasoning through a multi-agent architecture to dynamically refine procedural reasoning chains. By maintaining high-level semantic focus while enabling automatic error correction, IIPC achieves substantial performance gains over existing methods across multiple established mathematical reasoning benchmarks. The framework demonstrates broad compatibility with diverse base large language models and is accompanied by the public release of all source code.
📝 Abstract
Mathematical problem solving is a fundamental benchmark for assessing the reasoning capabilities of artificial intelligence and a gateway to applications in education, science, and engineering where reliable symbolic reasoning is essential. Although recent advances in multi-agent LLM-based systems have enhanced their mathematical reasoning capabilities, they still lack a reliably revisable representation of the reasoning process. Existing agents either operate in rigid sequential pipelines that cannot correct earlier steps or rely on heuristic self-evaluation that can fail to identify and fix errors. In addition, programmatic context can distract language models and degrade accuracy. To address these gaps, we introduce Iteratively Improved Program Construction (IIPC), a reasoning method that iteratively refines programmatic reasoning chains and combines execution feedback with the native Chain-of-thought abilities of the base LLM to maintain high-level contextual focus. IIPC surpasses competing approaches in the majority of reasoning benchmarks on multiple base LLMs. All code and implementations are released as open source.