The Effect of Depth on the Expressivity of Deep Linear State-Space Models

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the impact of depth on the expressive power of deep linear state-space models (SSMs) under parameter norm constraints, focusing on the depth-width trade-off. Using constructive proofs and linear systems theory, we establish the first rigorous characterization of how deep SSMs with bounded parameters can emulate shallow SSMs requiring arbitrarily large norms—and derive the minimal depth required to achieve such emulation. Our theory shows that, under spectral norm constraints, depth enables exponential compression of the parameter count needed to realize equivalent input-output mappings; in contrast, increasing width alone cannot yield comparable compression. Numerical experiments corroborate the expressive advantage conferred by depth and validate the theoretical bounds. This work provides the first tight theoretical characterization of depth’s role in linear SSMs, revealing depth’s indispensable function in structured sequence modeling.

Technology Category

Application Category

📝 Abstract
Deep state-space models (SSMs) have gained increasing popularity in sequence modelling. While there are numerous theoretical investigations of shallow SSMs, how the depth of the SSM affects its expressiveness remains a crucial problem. In this paper, we systematically investigate the role of depth and width in deep linear SSMs, aiming to characterize how they influence the expressive capacity of the architecture. First, we rigorously prove that in the absence of parameter constraints, increasing depth and increasing width are generally equivalent, provided that the parameter count remains within the same order of magnitude. However, under the assumption that the parameter norms are constrained, the effects of depth and width differ significantly. We show that a shallow linear SSM with large parameter norms can be represented by a deep linear SSM with smaller norms using a constructive method. In particular, this demonstrates that deep SSMs are more capable of representing targets with large norms than shallow SSMs under norm constraints. Finally, we derive upper bounds on the minimal depth required for a deep linear SSM to represent a given shallow linear SSM under constrained parameter norms. We also validate our theoretical results with numerical experiments
Problem

Research questions and friction points this paper is trying to address.

How depth affects expressivity in deep linear SSMs
Comparing depth vs width impact on model capacity
Minimal depth needed to represent shallow SSMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep and wide SSMs are equivalent unconstrained
Deep SSMs outperform shallow under norm constraints
Constructive method links shallow and deep SSMs
🔎 Similar Papers
2024-05-27Neural Information Processing SystemsCitations: 24
Zeyu Bao
Zeyu Bao
National University of Singapore
Machine Learning
P
Penghao Yu
Department of Mathematics, National University of Singapore
H
Haotian Jiang
Department of Mathematics, Institute for Functional Intelligent Materials, National University of Singapore
Qianxiao Li
Qianxiao Li
Assistant Professor, Department of Mathematics and Institute for Functional Intelligent Materials
applied mathematicsmachine learningscientific computingcontrol theorymaterials science