π€ AI Summary
This work addresses the challenge of catastrophic forgetting in continual learning, where agents struggle to acquire new knowledge without compromising previously learned capabilities, and existing approaches often fail to enhance the modelβs intrinsic capacity. To overcome this, the authors propose MoLEM, a framework that constructs a generative latent memory system via dynamic Mixture-of-Experts (MoE), treating individual experts as independent memory units. A key-value query-based routing mechanism dynamically selects and weights relevant memories. Coupled with a stage-aware lightweight autoencoder, MoLEM injects synthesized memories into the inference process of a frozen primary model, enabling knowledge internalization without parameter updates. Evaluated on continual learning tasks spanning mathematics, science, and code, MoLEM achieves an average accuracy gain of 10.40% over pretrained baselines, substantially outperforming current state-of-the-art methods.
π Abstract
Achieving self-evolution in intelligent agents requires the continual accumulation of new knowledge across changing task sequences without forgetting previously acquired abilities. Existing approaches either internalize knowledge by updating model parameters, which induces catastrophic forgetting, or rely on external memory, which fails to genuinely enhance the model's intrinsic capabilities. We propose MoLEM, a generative mixture of latent memory framework based on a dynamic mixture-of-experts (MoE). We treat multiple experts as independent carriers to generate memory. A router selects and weights experts through key-query matching, and the aggregated latent memory is injected into the reasoning process. The base model for reasoning remains entirely frozen, with all experiential knowledge internalized into the additional modules, avoiding catastrophic forgetting. For continual learning, each training stage is paired with a lightweight autoencoder that selects the appropriate routing group at inference, and inputs that match no stage fall back to the pretrained model. Experiments train the framework on continual-learning sequences spanning math, science, and code domains. After training, we evaluate the framework on the corresponding test sets to measure task learning and competence preservation across continual adaptation stages. After the full continual-learning sequence, our method improves the average accuracy by 10.40% over the Vanilla pretrained baseline, while none of the competing methods consistently exceed this baseline across different training orders.