🤖 AI Summary
To address the challenges of long-range spatiotemporal modeling, low parameter efficiency, and poor generalization in learning solution operators for time-dependent partial differential equations (PDEs), this paper proposes the Spatiotemporal State Space Neural Operator (ST-SSM). ST-SSM is the first to integrate structured state space models (SSMs) into the neural operator framework, employing spatiotemporal decoupling: temporal dynamics are captured via SSMs for long-range evolution, while spatial interactions—under diverse physical conditions—are modeled through factorized SSMs to encode non-local dependencies. Theoretically, we establish a unifying universality theorem linking SSMs and neural operators. Empirically, ST-SSM achieves superior accuracy with significantly fewer parameters across multiple PDE benchmarks and demonstrates enhanced robustness under partial observability.
📝 Abstract
We propose the Spatiotemporal State Space Neural Operator (ST-SSM), a compact architecture for learning solution operators of time-dependent partial differential equations (PDEs). ST-SSM introduces a novel factorization of the spatial and temporal dimensions, using structured state-space models to independently model temporal evolution and spatial interactions. This design enables parameter efficiency and flexible modeling of long-range spatiotemporal dynamics. A theoretical connection is established between SSMs and neural operators, and a unified universality theorem is proved for the resulting class of architectures. Empirically, we demonstrate that our factorized formulation outperforms alternative schemes such as zigzag scanning and parallel independent processing on several PDE benchmarks, including 1D Burgers' equation, 1D Kuramoto-Sivashinsky equation, and 2D Navier-Stokes equations under varying physical conditions. Our model performs competitively with existing baselines while using significantly fewer parameters. In addition, our results reinforce previous findings on the benefits of temporal memory by showing improved performance under partial observability. Our results highlight the advantages of dimensionally factorized operator learning for efficient and generalizable PDE modeling, and put this approach on a firm theoretical footing.