An accuracy-runtime trade-off comparison of scalable Gaussian process approximations for spatial data

📅 2025-01-20

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Gaussian processes (GPs) suffer from poor scalability to large spatial datasets due to cubic time and quadratic memory complexity in the number of observations $N$. This work systematically evaluates six classes of scalable GP approximations—within a unified experimental framework—across three core tasks: marginal likelihood estimation, hyperparameter learning, and prediction. We provide the first comprehensive, quantitative comparison of their accuracy–efficiency trade-offs on both multi-scale synthetic benchmarks and real-world large-scale spatial datasets. Key findings include: the Vecchia approximation achieves the highest overall accuracy and robustness; Full-Scale Approximation (FSA) and Modified Predictive Process (MPP) significantly improve predictive distribution calibration—particularly in extrapolation; and inducing-point methods exhibit competitive performance in specific extrapolatory regimes. Our empirical results establish a rigorous, task-specific benchmark and offer practical guidance for selecting GP approximations in spatial statistics applications.

Technology Category

Application Category

📝 Abstract

Gaussian processes (GPs) are flexible, probabilistic, non-parametric models widely employed in various fields such as spatial statistics, time series analysis, and machine learning. A drawback of Gaussian processes is their computational cost having $mathcal{O}(N^3)$ time and $mathcal{O}(N^2)$ memory complexity which makes them prohibitive for large datasets. Numerous approximation techniques have been proposed to address this limitation. In this work, we systematically compare the accuracy of different Gaussian process approximations concerning marginal likelihood evaluation, parameter estimation, and prediction taking into account the time required to achieve a certain accuracy. We analyze this trade-off between accuracy and runtime on multiple simulated and large-scale real-world datasets and find that Vecchia approximations consistently emerge as the most accurate in almost all experiments. However, for certain real-world data sets, low-rank inducing point-based methods, i.e., full-scale and modified predictive process approximations, can provide more accurate predictive distributions for extrapolation.

Problem

Research questions and friction points this paper is trying to address.

Compare accuracy-runtime trade-offs in Gaussian process approximations

Evaluate scalability of approximations for large spatial datasets

Identify most accurate approximation method for given runtime

Innovation

Methods, ideas, or system contributions that make the work stand out.

Scalable Gaussian process approximations for spatial data

Trade-off analysis between accuracy and runtime

Vecchia approximations as most accurate in experiments

🔎 Similar Papers

Block Vecchia Approximation for Scalable and Efficient Gaussian Process Computations