DeciMamba: Exploring the Length Extrapolation Potential of Mamba

📅 2024-06-20

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 1

career value

173K/year

🤖 AI Summary

This work identifies a fundamental limitation of Mamba models: their effective receptive field is intrinsically constrained by the training sequence length, leading to severe degradation in generalization to longer, unseen sequences. To address this, we propose DeciMamba—a post-training, zero-cost context-length extrapolation method that enhances the S6 state-space layer’s hidden-state filtering mechanism via frequency-domain analysis and learnable gating-based reweighting. Crucially, DeciMamba requires no architectural modifications, introduces no additional inference latency, and operates without fine-tuning. It represents the first approach enabling plug-and-play, zero-shot long-range extrapolation for state-space models—achieving up to 25× training-length context extension. Extensive evaluation on realistic long-context NLP tasks demonstrates both robust performance stability and computational efficiency, thereby breaking the prevailing context-scaling bottleneck in modern state-space models.

Technology Category

Application Category

📝 Abstract

Long-range sequence processing poses a significant challenge for Transformers due to their quadratic complexity in input length. A promising alternative is Mamba, which demonstrates high performance and achieves Transformer-level capabilities while requiring substantially fewer computational resources. In this paper we explore the length-generalization capabilities of Mamba, which we find to be relatively limited. Through a series of visualizations and analyses we identify that the limitations arise from a restricted effective receptive field, dictated by the sequence length used during training. To address this constraint, we introduce DeciMamba, a context-extension method specifically designed for Mamba. This mechanism, built on top of a hidden filtering mechanism embedded within the S6 layer, enables the trained model to extrapolate well even without additional training. Empirical experiments over real-world long-range NLP tasks show that DeciMamba can extrapolate to context lengths that are 25x times longer than the ones seen during training, and does so without utilizing additional computational resources. We will release our code and models.

Problem

Research questions and friction points this paper is trying to address.

Mamba Model

Sequence Length

Predictive Limitations

Innovation

Methods, ideas, or system contributions that make the work stand out.

DeciMamba

Long Sequence Processing

Efficiency Improvement

🔎 Similar Papers

ReMamba: Equip Mamba with Effective Long-Sequence Modeling