🤖 AI Summary
This work investigates the impact of depth on the expressive power of deep linear state-space models (SSMs) under parameter norm constraints, focusing on the depth-width trade-off. Using constructive proofs and linear systems theory, we establish the first rigorous characterization of how deep SSMs with bounded parameters can emulate shallow SSMs requiring arbitrarily large norms—and derive the minimal depth required to achieve such emulation. Our theory shows that, under spectral norm constraints, depth enables exponential compression of the parameter count needed to realize equivalent input-output mappings; in contrast, increasing width alone cannot yield comparable compression. Numerical experiments corroborate the expressive advantage conferred by depth and validate the theoretical bounds. This work provides the first tight theoretical characterization of depth’s role in linear SSMs, revealing depth’s indispensable function in structured sequence modeling.
📝 Abstract
Deep state-space models (SSMs) have gained increasing popularity in sequence modelling. While there are numerous theoretical investigations of shallow SSMs, how the depth of the SSM affects its expressiveness remains a crucial problem. In this paper, we systematically investigate the role of depth and width in deep linear SSMs, aiming to characterize how they influence the expressive capacity of the architecture. First, we rigorously prove that in the absence of parameter constraints, increasing depth and increasing width are generally equivalent, provided that the parameter count remains within the same order of magnitude. However, under the assumption that the parameter norms are constrained, the effects of depth and width differ significantly. We show that a shallow linear SSM with large parameter norms can be represented by a deep linear SSM with smaller norms using a constructive method. In particular, this demonstrates that deep SSMs are more capable of representing targets with large norms than shallow SSMs under norm constraints. Finally, we derive upper bounds on the minimal depth required for a deep linear SSM to represent a given shallow linear SSM under constrained parameter norms. We also validate our theoretical results with numerical experiments