๐ค AI Summary
This paper addresses nonparametric regression and classification for path-valued data. Methodologically, it introduces a novel functional NadarayaโWatson estimator that integrates the signature transform with local kernel regression. It is the first to embed signature-based metrics into a local regression framework, leveraging the signature-induced distance to measure path similarity within the natural metric space of paths. To enhance robustness against noise and circumvent costly large-scale kernel matrix computations, the approach employs a robust variant of the signature transform, thereby significantly improving scalability in infinite-dimensional function spaces. Empirical evaluations on synthetic and real-world datasets demonstrate that the proposed method achieves both high predictive accuracy and computational efficiency, consistently outperforming existing baselines. Its effectiveness and generalization capability are further validated on two challenging tasks: parameter estimation for stochastic differential equations and time-series classification.
๐ Abstract
We study nonparametric regression and classification for path-valued data. We introduce a functional Nadaraya-Watson estimator that combines the signature transform from rough path theory with local kernel regression. The signature transform provides a principled way to encode sequential data through iterated integrals, enabling direct comparison of paths in a natural metric space. Our approach leverages signature-induced distances within the classical kernel regression framework, achieving computational efficiency while avoiding the scalability bottlenecks of large-scale kernel matrix operations. We establish finite-sample convergence bounds demonstrating favorable statistical properties of signature-based distances compared to traditional metrics in infinite-dimensional settings. We propose robust signature variants that provide stability against outliers, enhancing practical performance. Applications to both synthetic and real-world data - including stochastic differential equation learning and time series classification - demonstrate competitive accuracy while offering significant computational advantages over existing methods.