gp2Scale: A Class of Compactly-Supported Non-Stationary Kernels and Distributed Computing for Exact Gaussian Processes on 10 Million Data Points

📅 2025-12-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Gaussian process (GP) methods struggle to simultaneously achieve computational efficiency, predictive accuracy, and model customizability on large-scale datasets—primarily due to reliance on approximations that compromise uncertainty quantification, restrict kernel and noise model design, and hinder effective modeling of expressive nonstationary patterns. This paper introduces gp2Scale: a framework that automatically constructs sparse covariance structures via flexible, compactly supported nonstationary kernels, enabling exact GP inference on datasets with millions of observations without any approximation. By integrating distributed computing with advanced sparse linear algebra techniques, gp2Scale efficiently solves linear systems and computes log-determinants. The method supports arbitrary kernel functions, noise models, and input space types—including irregular or high-dimensional domains. Empirical evaluation across multiple real-world benchmarks demonstrates that gp2Scale significantly outperforms or matches state-of-the-art approximate GP methods, thereby overcoming a fundamental bottleneck in scalable exact GP modeling.

Technology Category

Application Category

📝 Abstract
Despite a large corpus of recent work on scaling up Gaussian processes, a stubborn trade-off between computational speed, prediction and uncertainty quantification accuracy, and customizability persists. This is because the vast majority of existing methodologies exploit various levels of approximations that lower accuracy and limit the flexibility of kernel and noise-model designs -- an unacceptable drawback at a time when expressive non-stationary kernels are on the rise in many fields. Here, we propose a methodology we term emph{gp2Scale} that scales exact Gaussian processes to more than 10 million data points without relying on inducing points, kernel interpolation, or neighborhood-based approximations, and instead leveraging the existing capabilities of a GP: its kernel design. Highly flexible, compactly supported, and non-stationary kernels lead to the identification of naturally occurring sparse structure in the covariance matrix, which is then exploited for the calculations of the linear system solution and the log-determinant for training. We demonstrate our method's functionality on several real-world datasets and compare it with state-of-the-art approximation algorithms. Although we show superior approximation performance in many cases, the method's real power lies in its agnosticism toward arbitrary GP customizations -- core kernel design, noise, and mean functions -- and the type of input space, making it optimally suited for modern Gaussian process applications.
Problem

Research questions and friction points this paper is trying to address.

Scaling exact Gaussian processes to over 10 million data points without approximations.
Enabling flexible non-stationary kernel designs while maintaining computational efficiency.
Overcoming trade-offs between speed, accuracy, and customizability in GP methods.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compactly-supported non-stationary kernels enable sparse covariance matrices
Exploits kernel design for exact Gaussian processes without approximations
Scales to over 10 million data points using distributed computing
🔎 Similar Papers
No similar papers found.
M
Marcus M. Noack
Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
M
Mark D. Risser
Climate and Ecosystem Sciences Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Drive, Berkeley, CA 94720
Hengrui Luo
Hengrui Luo
Unknown affiliation
V
Vardaan Tekriwal
UC Berkeley, 1 Sproul Hall, Berkeley, CA 94720
R
Ronald J. Pandolfi
Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720