Characterizing Mamba's Selective Memory using Auto-Encoders

📅 2025-12-17

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work investigates the selective forgetting mechanism of state space models (SSMs)—particularly the Mamba family (130M–1.4B parameters)—under fixed memory constraints when processing long sequences. To address *which semantic types and sequences are more prone to forgetting*, we propose an autoencoder-based latent-state reconstruction evaluation framework that quantifies information loss across token categories (e.g., part-of-speech tags, named entities) and sequence domains (e.g., code, mathematical problems). Our systematic analysis reveals, for the first time, that low-frequency tokens—including mathematical symbols, organizational named entities, and non-standard American English—exhibit significantly higher forgetting rates; crucially, forgetting magnitude is strongly negatively correlated with token frequency in the pretraining corpus. This establishes an interpretable, data-distribution-aware linkage between forgetting patterns and training statistics, providing both empirical grounding and a diagnostic tool for memory modeling and long-context optimization in SSMs.

Technology Category

Application Category

📝 Abstract

State space models (SSMs) are a promising alternative to transformers for language modeling because they use fixed memory during inference. However, this fixed memory usage requires some information loss in the hidden state when processing long sequences. While prior work has studied the sequence length at which this information loss occurs, it does not characterize the types of information SSM language models (LMs) tend to forget. In this paper, we address this knowledge gap by identifying the types of tokens (e.g., parts of speech, named entities) and sequences (e.g., code, math problems) that are more frequently forgotten by SSM LMs. We achieve this by training an auto-encoder to reconstruct sequences from the SSM's hidden state, and measure information loss by comparing inputs with their reconstructions. We perform experiments using the Mamba family of SSM LMs (130M--1.4B) on sequences ranging from 4--256 tokens. Our results show significantly higher rates of information loss on math-related tokens (e.g., numbers, variables), mentions of organization entities, and alternative dialects to Standard American English. We then examine the frequency that these tokens appear in Mamba's pretraining data and find that less prevalent tokens tend to be the ones Mamba is most likely to forget. By identifying these patterns, our work provides clear direction for future research to develop methods that better control Mamba's ability to retain important information.

Problem

Research questions and friction points this paper is trying to address.

Identifies token types SSM LMs frequently forget

Measures information loss via auto-encoder reconstruction comparison

Links forgetting patterns to token prevalence in pretraining data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using auto-encoders to reconstruct sequences from SSM hidden states

Measuring information loss by comparing inputs with reconstructions

Identifying forgotten token types like math symbols and named entities

🔎 Similar Papers

No similar papers found.