Block-Biased Mamba for Long-Range Sequence Processing

📅 2025-05-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Although Mamba is designed for modeling long-range dependencies, its performance degrades significantly on long-sequence tasks. Method: We systematically diagnose its fundamental limitations from three perspectives—representational capacity, inductive bias, and training stability—and propose B₂S₆, a novel state-space module integrating block-wise selective dynamics and channel-specific learnable biases. B₂S₆ combines block processing, input-dependent selective gating, and channel-adaptive biasing within the SSM framework. Contribution/Results: Theoretically, we prove that B₂S₆ jointly strengthens inductive bias, representation power, and optimization stability. Empirically, B₂S₆ outperforms S4 and S4D across all tasks in the Long-Range Arena benchmark while fully preserving Mamba’s strong performance on standard language modeling tasks. This yields substantial gains in both generality and robustness, establishing a new state-of-the-art for sequence modeling with linear-time complexity.

Technology Category

Application Category

📝 Abstract
Mamba extends earlier state space models (SSMs) by introducing input-dependent dynamics, and has demonstrated strong empirical performance across a range of domains, including language modeling, computer vision, and foundation models. However, a surprising weakness remains: despite being built on architectures designed for long-range dependencies, Mamba performs poorly on long-range sequential tasks. Understanding and addressing this gap is important for improving Mamba's universality and versatility. In this work, we analyze Mamba's limitations through three perspectives: expressiveness, inductive bias, and training stability. Our theoretical results show how Mamba falls short in each of these aspects compared to earlier SSMs such as S4D. To address these issues, we propose $ ext{B}_2 ext{S}_6$, a simple extension of Mamba's S6 unit that combines block-wise selective dynamics with a channel-specific bias. We prove that these changes equip the model with a better-suited inductive bias and improve its expressiveness and stability. Empirically, $ ext{B}_2 ext{S}_6$ outperforms S4 and S4D on Long-Range Arena (LRA) tasks while maintaining Mamba's performance on language modeling benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Mamba performs poorly on long-range sequential tasks
Mamba lacks expressiveness, inductive bias, and training stability
Proposing B2S6 to improve Mamba's long-range performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Block-wise selective dynamics enhance Mamba
Channel-specific bias improves model stability
Combined changes boost long-range task performance
🔎 Similar Papers
No similar papers found.