🤖 AI Summary
This work proposes a novel reasoning paradigm that overcomes the inefficiency and high computational cost of large language models (LLMs) caused by generating excessively verbose intermediate steps. Departing from conventional “reasoning from scratch” approaches, the method introduces a retrievable repository of reasoning skills for the first time. Effective reasoning paths discovered through trial-and-error exploration are distilled into a skill library, which the model dynamically queries during inference to guide its reasoning process, thereby avoiding redundant derivations. Evaluated on code generation and mathematical reasoning tasks, this approach significantly reduces token consumption while simultaneously improving accuracy and overall performance, demonstrating both computational efficiency and practical value for real-world deployment.
📝 Abstract
Reasoning LLMs often spend substantial tokens on long intermediate reasoning traces (e.g., chain-of-thought) when solving new problems. We propose to summarize and store reusable reasoning skills distilled from extensive deliberation and trial-and-error exploration, and to retrieve these skills at inference time to guide future reasoning. Unlike the prevailing \emph{reasoning from scratch} paradigm, our approach first recalls relevant skills for each query, helping the model avoid redundant detours and focus on effective solution paths. We evaluate our method on coding and mathematical reasoning tasks, and find that it significantly reduces reasoning tokens while improving overall performance. The resulting lower per-request cost indicates strong practical and economic potential for real-world deployment.