How more data can hurt: Instability and regularization in next-generation reservoir computing

📅 2024-07-11
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a counterintuitive instability phenomenon—“more data degrades performance”—in next-generation reservoir computing (NGRC), arising from ill-conditioned integrators induced by high-dimensional auxiliary states introduced via delay embedding, which severely impairs model generalization. To address this, we propose two novel stabilization strategies: (i) data-scale-adaptive ℓ₂ regularization and (ii) controllable noise injection during training. We substantiate the underlying mechanisms through flow-map learning and quantitative conditioning analysis. Experiments demonstrate that our methods significantly enhance long-term prediction stability and generalization across multi-scale data regimes. This study establishes the first data-scale-aware regularization design principle for data-driven modeling of dynamical systems using NGRC.

Technology Category

Application Category

📝 Abstract
It has been found recently that more data can, counter-intuitively, hurt the performance of deep neural networks. Here, we show that a more extreme version of the phenomenon occurs in data-driven models of dynamical systems. To elucidate the underlying mechanism, we focus on next-generation reservoir computing (NGRC) -- a popular framework for learning dynamics from data. We find that, despite learning a better representation of the flow map with more training data, NGRC can adopt an ill-conditioned ``integrator'' and lose stability. We link this data-induced instability to the auxiliary dimensions created by the delayed states in NGRC. Based on these findings, we propose simple strategies to mitigate the instability, either by increasing regularization strength in tandem with data size, or by carefully introducing noise during training. Our results highlight the importance of proper regularization in data-driven modeling of dynamical systems.
Problem

Research questions and friction points this paper is trying to address.

Dimensionality Reduction
Data Overloading
NGRC Method
Innovation

Methods, ideas, or system contributions that make the work stand out.

NGRC Stability
Data-Driven Training
Complexity Control
🔎 Similar Papers
No similar papers found.