🤖 AI Summary
This work addresses the lack of structured knowledge preservation mechanisms in self-evolving agents with frozen backbone language models by proposing the MAGE framework. MAGE explicitly stores and continuously accumulates knowledge through a co-evolving knowledge graph composed of four subgraphs—experience, tasks, skills, and relations—and employs a dual-layer multi-armed bandit mechanism operating at both task and skill levels. This design, combined with append-only memory growth, bounded curriculum coverage, and task-conditioned retrieval strategies, enables stable agent evolution without updating the backbone parameters. Experiments demonstrate that MAGE significantly outperforms existing frozen-backbone baselines across nine benchmarks, including mathematical reasoning, multi-hop question answering, financial analysis, and medical decision-making, highlighting the complementary benefits of memorizing successful trajectories and teacher correction feedback.
📝 Abstract
Self-evolving language-model agents must decide what to learn next and how to preserve what they have learned across iterations. Existing systems typically carry this cross-iteration knowledge as natural-language feedback, flat episodic memory, or implicit reinforcement signals, none of which cleanly supports a frozen weak backbone at inference time. This paper introduces MAGE (Multi-Agent Graph-guided Evolution), a framework that externalizes self-knowledge into a four-subgraph co-evolutionary knowledge graph. Its experience subgraph stores both teacher-written failure corrections and the learner's own past correct reasoning traces, which are retrieved as task-conditioned guidance for a frozen execution model. During evolution, the graph, a task-level search bandit, and a skill-level routing bandit are updated from the same reward stream, while the learner's backbone remains unchanged. We further provide structural analysis showing how append-only memory growth, bounded curriculum coverage, and task-filtered retrieval together support stable improvement of the retrieval substrate for frozen-learner evolution. Across nine benchmarks spanning mathematical reasoning, multi-hop and open-domain question answering, spatio-temporal analysis, financial numerical reasoning, medical multiple-choice, an open-world survival game, and web navigation, MAGE achieves strong performance against prompt-based frozen-backbone baselines. Ablations show that self-harvested success traces and teacher-written corrections are complementary, with success memories contributing most on reasoning-template-heavy tasks and corrective memories supporting harder composition and interaction settings.