๐ค AI Summary
This work addresses the limitations of existing meta-reasoning approaches, which are typically single-episode and event-driven, lacking mechanisms for accumulating metacognitive skills across instancesโleading to repeated failures and high cognitive overhead. To overcome this, the paper introduces a Metacognitive Consolidation framework that decouples reasoning, monitoring, and control roles to generate traceable meta-level trajectories. By integrating these trajectories through a hierarchical, multi-timescale mechanism, the framework consolidates experiences into continuously evolving metaknowledge. This approach enables, for the first time, the accumulation and reuse of metacognitive experience across tasks, thereby establishing a meta-reasoning system capable of sustained self-improvement. Empirical results demonstrate consistent performance gains across multiple benchmarks and backbone models, with reasoning capabilities progressively enhancing as metacognitive experience accumulates.
๐ Abstract
Large language models (LLMs) have demonstrated strong reasoning capabilities, and as existing approaches for enhancing LLM reasoning continue to mature, increasing attention has shifted toward meta-reasoning as a promising direction for further improvement. However, most existing meta-reasoning methods remain episodic: they focus on executing complex meta-reasoning routines within individual instances, but ignore the accumulation of reusable meta-reasoning skills across instances, leading to recurring failure modes and repeatedly high metacognitive effort. In this paper, we introduce Metacognitive Consolidation, a novel framework in which a model consolidates metacognitive experience from past reasoning episodes into reusable knowledge that improves future meta-reasoning. We instantiate this framework by structuring instance-level problem solving into distinct roles for reasoning, monitoring, and control to generate rich, attributable meta-level traces. These traces are then consolidated through a hierarchical, multi-timescale update mechanism that gradually forms evolving meta-knowledge. Experimental results demonstrate consistent performance gains across benchmarks and backbone models, and show that performance improves as metacognitive experience accumulates over time.