Memory Mosaics

📅 2024-05-10

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the poor interpretability of Transformer models by proposing Memory Mosaics—a modular architecture grounded in associative memory networks. The method employs multiple retrievable and editable memory units operating in concert to achieve predictive disentanglement: each prediction step is explicitly traceable to specific memory components and retrieval paths. Unlike opaque Transformer baselines, Memory Mosaics attains comparable or superior performance on medium-scale language modeling tasks while substantially enhancing transparency and controllability. Causal traceability is further validated on controlled synthetic tasks. Its core contribution lies in being the first approach to realize fine-grained, dynamic, retrieval-based memory composition and predictive disentanglement—without sacrificing Transformer-level modeling capacity.

Technology Category

Application Category

📝 Abstract

Memory Mosaics are networks of associative memories working in concert to achieve a prediction task of interest. Like transformers, memory mosaics possess compositional capabilities and in-context learning capabilities. Unlike transformers, memory mosaics achieve these capabilities in comparatively transparent way ("predictive disentanglement"). We illustrate these capabilities on a toy example and also show that memory mosaics perform as well or better than transformers on medium-scale language modeling tasks.

Problem

Research questions and friction points this paper is trying to address.

Memory Mosaics achieve predictive disentanglement in a transparent way.

Memory Mosaics perform comparably or better than transformers in language modeling.

Memory Mosaics demonstrate compositional and in-context learning capabilities.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Networks of associative memories for prediction tasks

Compositional and in-context learning capabilities

Transparent predictive disentanglement compared to transformers

🔎 Similar Papers

No similar papers found.