🤖 AI Summary
To address the challenge in long-horizon dynamical forecasting where Gaussian processes (GPs) struggle to simultaneously capture recent sensitivity and retain historical information, this paper proposes a data-driven dynamic forgetting mechanism. Our core innovation is the Random Fourier Decaying Signature Feature (RFDSF), which for the first time embeds learnable temporal decay into the signature kernel—enabling principled time-aware forgetting and joint modeling of Bayesian uncertainty. The method integrates sparse spectral GP approximation, variational inference, and a recurrent architecture to yield an efficient, scalable model that produces full-horizon joint predictive distributions in a single forward pass. On sequences of up to 10,000 timesteps, it achieves millisecond-scale inference (≈0.01 sec) with <1 GB GPU memory consumption. It outperforms existing signature-GP approaches in accuracy and matches state-of-the-art probabilistic time-series models.
📝 Abstract
The signature kernel is a kernel between time series of arbitrary length and comes with strong theoretical guarantees from stochastic analysis. It has found applications in machine learning such as covariance functions for Gaussian processes. A strength of the underlying signature features is that they provide a structured global description of a time series. However, this property can quickly become a curse when local information is essential and forgetting is required; so far this has only been addressed with ad-hoc methods such as slicing the time series into subsegments. To overcome this, we propose a principled, data-driven approach by introducing a novel forgetting mechanism for signatures. This allows the model to dynamically adapt its context length to focus on more recent information. To achieve this, we revisit the recently introduced Random Fourier Signature Features, and develop Random Fourier Decayed Signature Features (RFDSF) with Gaussian processes (GPs). This results in a Bayesian time series forecasting algorithm with variational inference, that offers a scalable probabilistic algorithm that processes and transforms a time series into a joint predictive distribution over time steps in one pass using recurrence. For example, processing a sequence of length $10^4$ steps in $approx 10^{-2}$ seconds and in $<1 ext{GB}$ of GPU memory. We demonstrate that it outperforms other GP-based alternatives and competes with state-of-the-art probabilistic time series forecasting algorithms.