🤖 AI Summary
Gaussian processes (GPs) suffer from poor scalability to large spatial datasets due to cubic time and quadratic memory complexity in the number of observations $N$. This work systematically evaluates six classes of scalable GP approximations—within a unified experimental framework—across three core tasks: marginal likelihood estimation, hyperparameter learning, and prediction. We provide the first comprehensive, quantitative comparison of their accuracy–efficiency trade-offs on both multi-scale synthetic benchmarks and real-world large-scale spatial datasets. Key findings include: the Vecchia approximation achieves the highest overall accuracy and robustness; Full-Scale Approximation (FSA) and Modified Predictive Process (MPP) significantly improve predictive distribution calibration—particularly in extrapolation; and inducing-point methods exhibit competitive performance in specific extrapolatory regimes. Our empirical results establish a rigorous, task-specific benchmark and offer practical guidance for selecting GP approximations in spatial statistics applications.
📝 Abstract
Gaussian processes (GPs) are flexible, probabilistic, non-parametric models widely employed in various fields such as spatial statistics, time series analysis, and machine learning. A drawback of Gaussian processes is their computational cost having $mathcal{O}(N^3)$ time and $mathcal{O}(N^2)$ memory complexity which makes them prohibitive for large datasets. Numerous approximation techniques have been proposed to address this limitation. In this work, we systematically compare the accuracy of different Gaussian process approximations concerning marginal likelihood evaluation, parameter estimation, and prediction taking into account the time required to achieve a certain accuracy. We analyze this trade-off between accuracy and runtime on multiple simulated and large-scale real-world datasets and find that Vecchia approximations consistently emerge as the most accurate in almost all experiments. However, for certain real-world data sets, low-rank inducing point-based methods, i.e., full-scale and modified predictive process approximations, can provide more accurate predictive distributions for extrapolation.