ReMamba: Equip Mamba with Effective Long-Sequence Modeling

📅 2024-08-28

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Mamba suffers from weak modeling capability for long-sequence natural language tasks, and its inference efficiency advantage fails to translate into performance gains. To address this, we propose a two-stage re-forward framework that innovatively integrates selective state compression with intra-layer dynamic adaptation—breaking Mamba’s long-range dependency modeling bottleneck without significant computational overhead. Our approach preserves the core structured state-space modeling paradigm while enhancing context awareness via a lightweight re-computation scheme. Evaluated on LongBench and L-Eval, it achieves absolute improvements of +3.2 and +1.6 points, respectively, approaching the performance of Transformer models with comparable parameter counts. Crucially, inference latency increases by less than 5%, marking the first instance where Mamba achieves substantive performance parity with Transformers on long-text understanding tasks.

Technology Category

Application Category

📝 Abstract

While the Mamba architecture demonstrates superior inference efficiency and competitive performance on short-context natural language processing (NLP) tasks, empirical evidence suggests its capacity to comprehend long contexts is limited compared to transformer-based models. In this study, we investigate the long-context efficiency issues of the Mamba models and propose ReMamba, which enhances Mamba's ability to comprehend long contexts. ReMamba incorporates selective compression and adaptation techniques within a two-stage re-forward process, incurring minimal additional inference costs overhead. Experimental results on the LongBench and L-Eval benchmarks demonstrate ReMamba's efficacy, improving over the baselines by 3.2 and 1.6 points, respectively, and attaining performance almost on par with same-size transformer models.

Problem

Research questions and friction points this paper is trying to address.

Mamba model

long sentence processing

performance efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

ReMamba Model

Long Sequence Optimization

Performance Enhancement

🔎 Similar Papers

DeciMamba: Exploring the Length Extrapolation Potential of Mamba