🤖 AI Summary
To address the dual challenges of robustness and computational scalability in Gaussian processes (GPs) on large-scale, outlier-contaminated datasets, this paper proposes a unified modeling framework that jointly characterizes computational uncertainty—introduced by sparse GP approximations—and statistical uncertainty—arising from data outliers. Methodologically, the approach integrates generalized Bayesian updating, low-rank uncertainty calibration, and an adaptive selection mechanism for robust mean functions. This design ensures both computational efficiency—retaining the time complexity of sparse GPs—and conservative uncertainty quantification alongside stable, outlier-resilient mean predictions. Experiments demonstrate substantial improvements in predictive accuracy and optimization stability on outlier-corrupted regression tasks and high-throughput Bayesian optimization benchmarks, while preserving scalability.
📝 Abstract
Gaussian processes (GPs) are widely used for regression and optimization tasks such as Bayesian optimization (BO) due to their expressiveness and principled uncertainty estimates. However, in settings with large datasets corrupted by outliers, standard GPs and their sparse approximations struggle with computational tractability and robustness. We introduce Robust Computation-aware Gaussian Process (RCaGP), a novel GP model that jointly addresses these challenges by combining a principled treatment of approximation-induced uncertainty with robust generalized Bayesian updating. The key insight is that robustness and approximation-awareness are not orthogonal but intertwined: approximations can exacerbate the impact of outliers, and mitigating one without the other is insufficient. Unlike previous work that focuses narrowly on either robustness or approximation quality, RCaGP combines both in a principled and scalable framework, thus effectively managing both outliers and computational uncertainties introduced by approximations such as low-rank matrix multiplications. Our model ensures more conservative and reliable uncertainty estimates, a property we rigorously demonstrate. Additionally, we establish a robustness property and show that the mean function is key to preserving it, motivating a tailored model selection scheme for robust mean functions. Empirical results confirm that solving these challenges jointly leads to superior performance across both clean and outlier-contaminated settings, both on regression and high-throughput Bayesian optimization benchmarks.