🤖 AI Summary
Gaussian processes (GPs) face a computational bottleneck in large-scale regression due to $O(n^3)$ training complexity; while computation-aware GPs (CAGPs) reduce cost, their approximations often yield overly conservative uncertainty quantification. This paper introduces CAGP-GS—the first statistically calibrated CAGP framework. It establishes, for the first time, a theoretical equivalence between the calibration of probabilistic linear solvers and the calibration of CAGP posterior uncertainties. Leveraging a Gauss–Seidel–type probabilistic iterative solver, CAGP-GS jointly models computational uncertainty and explicitly calibrates the posterior distribution. The statistical calibration is rigorously validated on synthetic benchmarks. On a large-scale global temperature regression task, CAGP-GS achieves a superior trade-off between predictive reliability and computational efficiency—outperforming existing CAGPs in both uncertainty calibration and wall-clock time.
📝 Abstract
Gaussian processes are notorious for scaling cubically with the size of the training set, preventing application to very large regression problems. Computation-aware Gaussian processes (CAGPs) tackle this scaling issue by exploiting probabilistic linear solvers to reduce complexity, widening the posterior with additional computational uncertainty due to reduced computation. However, the most commonly used CAGP framework results in (sometimes dramatically) conservative uncertainty quantification, making the posterior unrealistic in practice. In this work, we prove that if the utilised probabilistic linear solver is calibrated, in a rigorous statistical sense, then so too is the induced CAGP. We thus propose a new CAGP framework, CAGP-GS, based on using Gauss-Seidel iterations for the underlying probabilistic linear solver. CAGP-GS performs favourably compared to existing approaches when the test set is low-dimensional and few iterations are performed. We test the calibratedness on a synthetic problem, and compare the performance to existing approaches on a large-scale global temperature regression problem.