On the Expressivity of Selective State-Space Layers: A Multivariate Polynomial Approach

📅 2025-02-04

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the lack of theoretical characterization of the expressive power of selective state-space models (SSMs)—the core component of Mamba. We establish, for the first time, a rigorous expressivity comparison framework between selective SSMs and linear Transformers. Leveraging multivariate polynomial approximation theory and state-space model analysis, we prove that, under equal parameter budgets, selective SSMs can exactly represent a strictly broader class of long-range dependency functions than linear Transformers; this advantage is particularly pronounced in long-sequence modeling and does not compromise generalization performance. Our theoretical findings are empirically validated across multiple long-sequence benchmarks—including PG19 and ImageNet-64—demonstrating consistent gains in modeling capacity and predictive accuracy. This constitutes the first formal proof of expressive superiority for Mamba-style architectures over linearized Transformer variants.

Technology Category

Application Category

📝 Abstract

Recent advances in efficient sequence modeling have introduced selective state-space layers, a key component of the Mamba architecture, which have demonstrated remarkable success in a wide range of NLP and vision tasks. While Mamba's empirical performance has matched or surpassed SoTA transformers on such diverse benchmarks, the theoretical foundations underlying its powerful representational capabilities remain less explored. In this work, we investigate the expressivity of selective state-space layers using multivariate polynomials, and prove that they surpass linear transformers in expressiveness. Consequently, our findings reveal that Mamba offers superior representational power over linear attention-based models for long sequences, while not sacrificing their generalization. Our theoretical insights are validated by a comprehensive set of empirical experiments on various datasets.

Problem

Research questions and friction points this paper is trying to address.

explore theoretical foundations of selective state-space layers

compare expressiveness with linear transformers

validate representational power on diverse datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective state-space layers

Multivariate polynomial approach

Superior representational power

🔎 Similar Papers

S4Sleep: Elucidating the design space of deep-learning-based sleep stage classification models