On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages

📅 2024-12-26

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work investigates the length generalization capability of selective state space models (SSMs) on regular language tasks—such as finite-state automata (FSA)—where existing SSMs fail to extrapolate to unseen sequence lengths. To address this, we propose the first single-layer selective dense SSM (SD-SSM) achieving perfect length generalization. SD-SSM introduces a dense transition matrix dictionary coupled with a timestep-adaptive softmax convex combination mechanism, augmented by layer normalization and linear readout. Theoretical analysis characterizes its distinct generalization behavior on commutative versus non-commutative FSAs and identifies the underlying structural causes. Empirically, SD-SSM attains 100% accuracy on length extrapolation across diverse regular language benchmarks, substantially outperforming standard SSMs and their variants. This establishes SD-SSM as a novel paradigm for structured sequence modeling that jointly achieves high expressivity and robust length generalization.

Technology Category

Application Category

📝 Abstract

Selective state-space models (SSMs) are an emerging alternative to the Transformer, offering the unique advantage of parallel training and sequential inference. Although these models have shown promising performance on a variety of tasks, their formal expressiveness and length generalization properties remain underexplored. In this work, we provide insight into the workings of selective SSMs by analyzing their expressiveness and length generalization performance on regular language tasks, i.e., finite-state automaton (FSA) emulation. We address certain limitations of modern SSM-based architectures by introducing the Selective Dense State-Space Model (SD-SSM), the first selective SSM that exhibits perfect length generalization on a set of various regular language tasks using a single layer. It utilizes a dictionary of dense transition matrices, a softmax selection mechanism that creates a convex combination of dictionary matrices at each time step, and a readout consisting of layer normalization followed by a linear map. We then proceed to evaluate variants of diagonal selective SSMs by considering their empirical performance on commutative and non-commutative automata. We explain the experimental results with theoretical considerations. Our code is available at https://github.com/IBM/selective-dense-state-space-model.

Problem

Research questions and friction points this paper is trying to address.

Selective State Space Models

Variable Input Lengths

Simple Language Tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective Dense State Space Model

Adaptive Sequence Processing

Efficient Language Task Handling

🔎 Similar Papers

The Expressive Capacity of State Space Models: A Formal Language Perspective