🤖 AI Summary
How can large language models (LLMs) achieve continual, cumulative improvement in code reasoning? This paper proposes a dynamic self-evolution framework that synergistically integrates meta-reflection—real-time internal assessment and correction of logical reasoning during inference—with cross-referencing—inter-agent validation and fusion of heterogeneous solutions. Grounded in a formal code-logic reasoning framework, the approach enables LLMs to simultaneously accumulate experience, assimilate external feedback, and refine subsequent behavior within a single inference step, transcending static model paradigms. Evaluated on multiple code-reasoning benchmarks—including HumanEval, MBPP, and APPS—the method substantially outperforms state-of-the-art baselines, achieving an average +7.2% absolute gain in pass@1 accuracy. It also demonstrates markedly improved generalization and error recovery capability. To our knowledge, this is the first work to realize verifiable, incremental cognitive evolution of LLMs in code tasks.
📝 Abstract
The ability to reason is one of the most fundamental capabilities of large language models (LLMs), enabling a wide range of downstream tasks through sophisticated problem-solving. A critical aspect of this is code reasoning, which involves logical reasoning with formal languages (i.e., programming code). In this paper, we enhance this capability of LLMs by exploring the following question: how can an LLM agent become progressively smarter in code reasoning with each solution it proposes, thereby achieving substantial cumulative improvement? Most existing research takes a static perspective, focusing on isolated problem-solving using frozen LLMs. In contrast, we adopt a cognitive-evolving perspective and propose a novel framework named Meta-Reflection with Cross-Referencing (MARCO) that enables the LLM to evolve dynamically during inference through self-improvement. From the perspective of human cognitive development, we leverage both knowledge accumulation and lesson sharing. In particular, to accumulate knowledge during problem-solving, we propose meta-reflection that reflects on the reasoning paths of the current problem to obtain knowledge and experience for future consideration. Moreover, to effectively utilize the lessons from other agents, we propose cross-referencing that incorporates the solution and feedback from other agents into the current problem-solving process. We conduct experiments across various datasets in code reasoning, and the results demonstrate the effectiveness of MARCO.