On the Expressivity of Selective State-Space Layers: A Multivariate Polynomial Approach

📅 2025-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of theoretical characterization of the expressive power of selective state-space models (SSMs)—the core component of Mamba. We establish, for the first time, a rigorous expressivity comparison framework between selective SSMs and linear Transformers. Leveraging multivariate polynomial approximation theory and state-space model analysis, we prove that, under equal parameter budgets, selective SSMs can exactly represent a strictly broader class of long-range dependency functions than linear Transformers; this advantage is particularly pronounced in long-sequence modeling and does not compromise generalization performance. Our theoretical findings are empirically validated across multiple long-sequence benchmarks—including PG19 and ImageNet-64—demonstrating consistent gains in modeling capacity and predictive accuracy. This constitutes the first formal proof of expressive superiority for Mamba-style architectures over linearized Transformer variants.

Technology Category

Application Category

📝 Abstract
Recent advances in efficient sequence modeling have introduced selective state-space layers, a key component of the Mamba architecture, which have demonstrated remarkable success in a wide range of NLP and vision tasks. While Mamba's empirical performance has matched or surpassed SoTA transformers on such diverse benchmarks, the theoretical foundations underlying its powerful representational capabilities remain less explored. In this work, we investigate the expressivity of selective state-space layers using multivariate polynomials, and prove that they surpass linear transformers in expressiveness. Consequently, our findings reveal that Mamba offers superior representational power over linear attention-based models for long sequences, while not sacrificing their generalization. Our theoretical insights are validated by a comprehensive set of empirical experiments on various datasets.
Problem

Research questions and friction points this paper is trying to address.

explore theoretical foundations of selective state-space layers
compare expressiveness with linear transformers
validate representational power on diverse datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective state-space layers
Multivariate polynomial approach
Superior representational power
🔎 Similar Papers
No similar papers found.