Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors

📅 2025-09-16

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Large language models (LLMs) frequently regenerate identical intermediate reasoning steps in multi-step inference, leading to excessive token consumption, increased latency, context saturation, and diminished exploratory capacity. To address this, we propose the *Behavior Handbook*—a meta-cognitive mechanism that identifies redundant reasoning patterns via trajectory analysis, automatically distills reusable, structured behavioral units from chain-of-thought traces through clustering, and supports three application modes: contextual injection, self-improvement, and supervised fine-tuning (SFT). Our method integrates behavior-conditioned reasoning, in-context learning, self-critique, and SFT optimization. Experiments show a 46% reduction in inference tokens with maintained or improved accuracy; a 10% relative gain in self-improvement performance over baselines; and more efficient SFT that enhances generalizable reasoning. The core contribution is the explicit modeling of implicit reasoning as modular, retrievable, composable, and evolvable behavioral knowledge.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) now solve multi-step problems by emitting extended chains of thought. During the process, they often re-derive the same intermediate steps across problems, inflating token usage and latency. This saturation of the context window leaves less capacity for exploration. We study a simple mechanism that converts recurring reasoning fragments into concise, reusable "behaviors" (name + instruction) via the model's own metacognitive analysis of prior traces. These behaviors are stored in a "behavior handbook" which supplies them to the model in-context at inference or distills them into parameters via supervised fine-tuning. This approach achieves improved test-time reasoning across three different settings - 1) Behavior-conditioned inference: Providing the LLM relevant behaviors in-context during reasoning reduces number of reasoning tokens by up to 46% while matching or improving baseline accuracy; 2) Behavior-guided self-improvement: Without any parameter updates, the model improves its own future reasoning by leveraging behaviors from its own past problem solving attempts. This yields up to 10% higher accuracy than a naive critique-and-revise baseline; and 3) Behavior-conditioned SFT: SFT on behavior-conditioned reasoning traces is more effective at converting non-reasoning models into reasoning models as compared to vanilla SFT. Together, these results indicate that turning slow derivations into fast procedural hints enables LLMs to remember how to reason, not just what to conclude.

Problem

Research questions and friction points this paper is trying to address.

Reducing token usage and latency in LLM multi-step reasoning

Converting recurring reasoning fragments into reusable behaviors

Improving test-time reasoning efficiency and accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Metacognitive analysis converts reasoning into reusable behaviors

Behavior handbook stores concise instructions for in-context use

Enables token reduction and accuracy improvement across settings

🔎 Similar Papers

Rational Metareasoning for Large Language Models