🤖 AI Summary
This work investigates the necessity and benefits of incorporating historical state memory when modeling time-dependent partial differential equations (PDEs), challenging the conventional Markovian assumption. Leveraging the Mori–Zwanzig formalism, we provide the first rigorous theoretical proof that explicit memory modeling yields fundamental representational gains for linear PDEs. Building on this insight, we propose the Memory Neural Operator (MemNO), a novel architecture that synergistically integrates the S4 state-space model—capable of capturing long-range temporal dependencies—with the Fourier Neural Operator—designed to represent spatial nonlinearities. Evaluated on challenging benchmarks—including low-resolution data, noisy observations, and high-frequency-dominated PDEs (e.g., low-viscosity fluid dynamics)—MemNO achieves up to a 6× reduction in test error compared to state-of-the-art baselines. The method significantly enhances generalization accuracy and robustness under distributional shifts and data scarcity.
📝 Abstract
Data-driven techniques have emerged as a promising alternative to traditional numerical methods for solving PDEs. For time-dependent PDEs, many approaches are Markovian -- the evolution of the trained system only depends on the current state, and not the past states. In this work, we investigate the benefits of using memory for modeling time-dependent PDEs: that is, when past states are explicitly used to predict the future. Motivated by the Mori-Zwanzig theory of model reduction, we theoretically exhibit examples of simple (even linear) PDEs, in which a solution that uses memory is arbitrarily better than a Markovian solution. Additionally, we introduce Memory Neural Operator (MemNO), a neural operator architecture that combines recent state space models (specifically, S4) and Fourier Neural Operators (FNOs) to effectively model memory. We empirically demonstrate that when the PDEs are supplied in low resolution or contain observation noise at train and test time, MemNO significantly outperforms the baselines without memory -- with up to 6x reduction in test error. Furthermore, we show that this benefit is particularly pronounced when the PDE solutions have significant high-frequency Fourier modes (e.g., low-viscosity fluid dynamics) and we construct a challenging benchmark dataset consisting of such PDEs.