🤖 AI Summary
Gaussian processes (GPs) suffer from limited robustness under non-Gaussian noise and sparse outliers, making it challenging to simultaneously achieve high accuracy and robustness. Method: We propose a data-point-level adaptive noise modeling framework that estimates point-specific noise variances. Theoretically, we establish strong convexity of the log marginal likelihood under this parameterization—first such proof—and derive weak submodularity, enabling a sequential selection algorithm with provable approximation guarantees. Our method integrates correlation pursuit, submodular optimization, and GP marginal likelihood optimization. Results: Experiments on regression and Bayesian optimization tasks with sparse label corruption demonstrate that our approach significantly outperforms existing robust GP methods, achieving both high predictive accuracy and strong theoretical guarantees alongside practical robustness.
📝 Abstract
Gaussian processes (GPs) are non-parametric probabilistic regression models that are popular due to their flexibility, data efficiency, and well-calibrated uncertainty estimates. However, standard GP models assume homoskedastic Gaussian noise, while many real-world applications are subject to non-Gaussian corruptions. Variants of GPs that are more robust to alternative noise models have been proposed, and entail significant trade-offs between accuracy and robustness, and between computational requirements and theoretical guarantees. In this work, we propose and study a GP model that achieves robustness against sparse outliers by inferring data-point-specific noise levels with a sequential selection procedure maximizing the log marginal likelihood that we refer to as relevance pursuit. We show, surprisingly, that the model can be parameterized such that the associated log marginal likelihood is strongly concave in the data-point-specific noise variances, a property rarely found in either robust regression objectives or GP marginal likelihoods. This in turn implies the weak submodularity of the corresponding subset selection problem, and thereby proves approximation guarantees for the proposed algorithm. We compare the model's performance relative to other approaches on diverse regression and Bayesian optimization tasks, including the challenging but common setting of sparse corruptions of the labels within or close to the function range.