🤖 AI Summary
Transformer-based large language models (LLMs) fundamentally lack explicit, interpretable text memory mechanisms. To address this, we propose MeMo—a novel architecture embodying the “memory-before-learning” paradigm. MeMo introduces a hierarchical associative memory module enabling explicit, transparent, and controllable storage, retrieval, and forgetting of token-level textual content, thereby overcoming the limitations of implicit, parameterized memory. Its core contributions are threefold: (1) a fully editable and interpretable explicit memory interface; (2) precise memory writing and retrieval without any training or parameter updates; and (3) flexible support for both single-layer and cross-layer associative modeling. Extensive experiments demonstrate that MeMo significantly enhances memory controllability, traceability, and intervenability—while maintaining computational efficiency—offering a principled pathway toward human-like memory mechanisms in foundation models.
📝 Abstract
Memorization is a fundamental ability of Transformer-based Large Language Models, achieved through learning. In this paper, we propose a paradigm shift by designing an architecture to memorize text directly, bearing in mind the principle that memorization precedes learning. We introduce MeMo, a novel architecture for language modeling that explicitly memorizes sequences of tokens in layered associative memories. By design, MeMo offers transparency and the possibility of model editing, including forgetting texts. We experimented with the MeMo architecture, showing the memorization power of the one-layer and the multi-layer configurations.