🤖 AI Summary
This work addresses key challenges in modeling time series and graph-structured data—namely, dynamic evolution, long-range dependencies, and irregular sampling. Methodologically, it introduces a scalable machine learning framework grounded in path signatures, integrating rough path theory with modern learning paradigms: (i) a signature-kernel Gaussian process for robust uncertainty quantification in time series; (ii) Seq2Tens, a low-rank tensor model enabling efficient sequence-to-tensor mapping; and (iii) an expected signature-driven diffusion process on graphs, coupled with a recursive sparse spectral method, as a principled alternative to conventional graph neural networks. The framework unifies path signatures, kernel methods, random Fourier features, and low-rank decomposition—ensuring theoretical soundness while achieving linear scalability. Experiments demonstrate state-of-the-art performance in multi-step time series forecasting and graph learning tasks, with strong expressivity, intrinsic interpretability, and computational efficiency.
📝 Abstract
The interface between stochastic analysis and machine learning is a rapidly evolving field, with path signatures - iterated integrals that provide faithful, hierarchical representations of paths - offering a principled and universal feature map for sequential and structured data. Rooted in rough path theory, path signatures are invariant to reparameterization and well-suited for modelling evolving dynamics, long-range dependencies, and irregular sampling - common challenges in real-world time series and graph data.
This thesis investigates how to harness the expressive power of path signatures within scalable machine learning pipelines. It introduces a suite of models that combine theoretical robustness with computational efficiency, bridging rough path theory with probabilistic modelling, deep learning, and kernel methods. Key contributions include: Gaussian processes with signature kernel-based covariance functions for uncertainty-aware time series modelling; the Seq2Tens framework, which employs low-rank tensor structure in the weight space for scalable deep modelling of long-range dependencies; and graph-based models where expected signatures over graphs induce hypo-elliptic diffusion processes, offering expressive yet tractable alternatives to standard graph neural networks. Further developments include Random Fourier Signature Features, a scalable kernel approximation with theoretical guarantees, and Recurrent Sparse Spectrum Signature Gaussian Processes, which combine Gaussian processes, signature kernels, and random features with a principled forgetting mechanism for multi-horizon time series forecasting with adaptive context length.
We hope this thesis serves as both a methodological toolkit and a conceptual bridge, and provides a useful reference for the current state of the art in scalable, signature-based learning for sequential and structured data.