The Illusion of State in State-Space Models

📅 2024-04-12

🏛️ International Conference on Machine Learning

📈 Citations: 35

✨ Influential: 6

career value

192K/year

🤖 AI Summary

This work investigates whether state space models (SSMs) possess superior expressive power over Transformers for state-tracking tasks. Using computational complexity theory, we formally prove—first time in the literature—that SSMs are confined to the TC⁰ complexity class and thus provably incapable of reliably solving fundamental state-dependent tasks such as permutation composition, chess move parsing, program execution, and long-range entity tracking; their “state” is merely an illusory artifact. Our analysis integrates circuit complexity theory, formal language recognizability arguments, and empirical evaluation on Mamba-style architectures across diverse state-tracking benchmarks. Experiments confirm the theoretical prediction: SSMs underperform significantly relative to both human-level performance and idealized models. The core contribution is the first complexity-theoretic refutation of SSMs’ purported state advantage, revealing that SSMs and Transformers share an intrinsic expressive bottleneck rooted in their shared inability to capture hierarchical or iterative state transformations beyond TC⁰.

Technology Category

Application Category

📝 Abstract

State-space models (SSMs) have emerged as a potential alternative architecture for building large language models (LLMs) compared to the previously ubiquitous transformer architecture. One theoretical weakness of transformers is that they cannot express certain kinds of sequential computation and state tracking (Merrill&Sabharwal, 2023), which SSMs are explicitly designed to address via their close architectural similarity to recurrent neural networks (RNNs). But do SSMs truly have an advantage (over transformers) in expressive power for state tracking? Surprisingly, the answer is no. Our analysis reveals that the expressive power of SSMs is limited very similarly to transformers: SSMs cannot express computation outside the complexity class $mathsf{TC}^0$. In particular, this means they cannot solve simple state-tracking problems like permutation composition. It follows that SSMs are provably unable to accurately track chess moves with certain notation, evaluate code, or track entities in a long narrative. To supplement our formal analysis, we report experiments showing that Mamba-style SSMs indeed struggle with state tracking. Thus, despite its recurrent formulation, the"state"in an SSM is an illusion: SSMs have similar expressiveness limitations to non-recurrent models like transformers, which may fundamentally limit their ability to solve real-world state-tracking problems.

Problem

Research questions and friction points this paper is trying to address.

SSMs fail to surpass transformers in state-tracking expressive power.

SSMs cannot solve simple state-tracking problems like permutation composition.

SSMs struggle with tasks like tracking chess moves and evaluating code.

Innovation

Methods, ideas, or system contributions that make the work stand out.

SSMs designed similar to RNNs for state tracking

SSMs limited to TC^0 complexity class like transformers

SSMs struggle with real-world state-tracking problems

🔎 Similar Papers

State-Space Modeling in Long Sequence Processing: A Survey on Recurrence in the Transformer Era