🤖 AI Summary
Lightweight language models exhibit limited performance on planning and mathematical reasoning tasks, whereas large models incur prohibitive computational costs. Method: This paper proposes a synergistic framework integrating general strategy distillation with iterative self-correction. It transfers cross-task general strategies—originally generated by large models—to lightweight models and introduces a parameter-efficient, iterative self-correcting prompting mechanism to enable dynamic reasoning refinement without increasing model parameters. Contribution/Results: The approach achieves state-of-the-art accuracy across multiple planning and mathematical reasoning benchmarks. It reduces average inference cost by 29.7% while significantly improving energy efficiency, establishing a new paradigm for resource-constrained efficient reasoning.
📝 Abstract
Recent advancements in the reasoning skills of Large Language Models (LLMs) demonstrate an increase in the ability of LLMs to solve simple planning tasks. However, as long as the driving force behind improved reasoning capability is the size and complexity of the model, the financial and computational costs associated with running them will also increase. This trend raises questions about continued accessibility and whether these improvements will increase at the same pace as models continue to grow in size and expense. We propose two approaches to enhance the reasoning ability of less resource-intensive LLMs. (1) Provide them with a generalised strategy for solving tasks within a given domain, generated by a more resource-intensive LLM. (2) Exploit their cost-effectiveness by iteratively prompting these models to correct errors in their proposed solutions. Our empirical results from planning and mathematical reasoning tasks demonstrate that these methods improve the performance of less resource-intensive LLMs to levels comparable with their more resource-intensive counterparts, at a fraction of the cost. Additionally, we show that the utilisation of generalised strategies in our experiments reduced the cost of the less resource-intensive model by nearly 30 percent on average.