🤖 AI Summary
This work identifies a counterintuitive instability phenomenon—“more data degrades performance”—in next-generation reservoir computing (NGRC), arising from ill-conditioned integrators induced by high-dimensional auxiliary states introduced via delay embedding, which severely impairs model generalization. To address this, we propose two novel stabilization strategies: (i) data-scale-adaptive ℓ₂ regularization and (ii) controllable noise injection during training. We substantiate the underlying mechanisms through flow-map learning and quantitative conditioning analysis. Experiments demonstrate that our methods significantly enhance long-term prediction stability and generalization across multi-scale data regimes. This study establishes the first data-scale-aware regularization design principle for data-driven modeling of dynamical systems using NGRC.
📝 Abstract
It has been found recently that more data can, counter-intuitively, hurt the performance of deep neural networks. Here, we show that a more extreme version of the phenomenon occurs in data-driven models of dynamical systems. To elucidate the underlying mechanism, we focus on next-generation reservoir computing (NGRC) -- a popular framework for learning dynamics from data. We find that, despite learning a better representation of the flow map with more training data, NGRC can adopt an ill-conditioned ``integrator'' and lose stability. We link this data-induced instability to the auxiliary dimensions created by the delayed states in NGRC. Based on these findings, we propose simple strategies to mitigate the instability, either by increasing regularization strength in tandem with data size, or by carefully introducing noise during training. Our results highlight the importance of proper regularization in data-driven modeling of dynamical systems.