The Expressive Capacity of State Space Models: A Formal Language Perspective

📅 2024-05-27

🏛️ Neural Information Processing Systems

📈 Citations: 24

✨ Influential: 9

career value

219K/year

🤖 AI Summary

This work investigates the fundamental expressive capacity of linear state-space models (SSMs) for language modeling, clarifying their theoretical modeling boundaries relative to Transformers and classical RNNs. Method: Leveraging formal language theory and automata theory, we formally characterize SSM expressivity—proving for the first time that linear SSMs can exactly recognize star-free languages and optimally model bounded hierarchical structures in memory. We identify a critical expressivity bottleneck in contemporary SSM designs arising from the absence of nonlinearity in state updates. Contribution/Results: Our analysis reveals that SSMs and Transformers possess complementary—not substitutive—capabilities. Empirical evaluation on the Mamba architecture demonstrates substantial gains over Transformers on star-free language tasks and superior memory efficiency in hierarchical structure modeling. These findings provide both theoretical foundations and practical guidance for designing next-generation efficient large language model architectures.

Technology Category

Application Category

📝 Abstract

Recently, recurrent models based on linear state space models (SSMs) have shown promising performance in language modeling (LM), competititve with transformers. However, there is little understanding of the in-principle abilities of such models, which could provide useful guidance to the search for better LM architectures. We present a comprehensive theoretical study of the capacity of such SSMs as it compares to that of transformers and traditional RNNs. We find that SSMs and transformers have overlapping but distinct strengths. In star-free state tracking, SSMs implement straightforward and exact solutions to problems that transformers struggle to represent exactly. They can also model bounded hierarchical structure with optimal memory even without simulating a stack. On the other hand, we identify a design choice in current SSMs that limits their expressive power. We discuss implications for SSM and LM research, and verify results empirically on a recent SSM, Mamba.

Problem

Research questions and friction points this paper is trying to address.

SSMs' expressive power compared to transformers and RNNs

SSMs handle star-free state tracking better than transformers

Current SSMs have design limits affecting their expressiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

SSMs solve star-free tracking exactly

SSMs model hierarchy without simulating stack

Current SSMs have limited expressive power design

🔎 Similar Papers

The Illusion of State in State-Space Models