On Structured State-Space Duality

๐Ÿ“… 2025-10-06
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses three fundamental limitations of the Structured State Space Duality (SSD) theory: (1) difficulty in generalizing from scalar-identity to general diagonal state matrices; (2) inherent trade-off between dynamic expressivity and lower bounds on training complexity; and (3) lack of precise characterization of equivalence between State Space Models (SSMs) and attention mechanisms. We extend SSD to SSMs with diagonal state matrices and establish, for the first time, necessary and sufficient conditions for their exact equivalence to 1-semiseparable causal-masked attentionโ€”while rigorously proving that this equivalence does not extend to standard softmax attention. Our approach integrates structured state space modeling, 1-semiseparable matrix theory, and sequence complexity analysis. The core contribution is a theoretical unification of recurrent and attention-based sequence modeling: we preserve the linear lower bound on training complexity while substantially enhancing dynamic modeling capacity, thereby introducing a new paradigm for efficient sequence model design.

Technology Category

Application Category

๐Ÿ“ Abstract
Structured State-Space Duality (SSD) [Dao & Gu, ICML 2024] is an equivalence between a simple Structured State-Space Model (SSM) and a masked attention mechanism. In particular, a state-space model with a scalar-times-identity state matrix is equivalent to a masked self-attention with a $1$-semiseparable causal mask. Consequently, the same sequence transformation (model) has two algorithmic realizations: as a linear-time $O(T)$ recurrence or as a quadratic-time $O(T^2)$ attention. In this note, we formalize and generalize this duality: (i) we extend SSD from the scalar-identity case to general diagonal SSMs (diagonal state matrices); (ii) we show that these diagonal SSMs match the scalar case's training complexity lower bounds while supporting richer dynamics; (iii) we establish a necessary and sufficient condition under which an SSM is equivalent to $1$-semiseparable masked attention; and (iv) we show that such duality fails to extend to standard softmax attention due to rank explosion. Together, these results tighten bridge between recurrent SSMs and Transformers, and widen the design space for expressive yet efficient sequence models.
Problem

Research questions and friction points this paper is trying to address.

Formalizing duality between state-space models and masked attention
Extending SSD equivalence from scalar to general diagonal SSMs
Establishing conditions for SSM and attention equivalence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Equivalence between state-space models and masked attention
Generalization from scalar to diagonal state matrices
Linear-time recurrence alternative to quadratic attention
๐Ÿ”Ž Similar Papers
No similar papers found.
Jerry Yao-Chieh Hu
Jerry Yao-Chieh Hu
Northwestern University
Machine Learning(* denotes equal contribution)
Xiwen Zhang
Xiwen Zhang
not Helixon anymore :)
LLMdiffusion modelcomputer systemsmachine learningcomputational biology
Weimin Wu
Weimin Wu
Ph.D. Candidate in Computer Science, Northwestern University
AI for BiologyML Theory
H
Han Liu
Center for Foundation Models and Generative AI, Northwestern University, Evanston, IL 60208, USA; Department of Statistics and Data Science, Northwestern University, Evanston, IL 60208, USA