🤖 AI Summary
This work addresses the computational limitations of traditional sparse variational Gaussian processes (SVGP), which rely on Cholesky decomposition and are thus ill-suited for low-precision, massively parallel hardware. The authors propose a novel approach that eliminates explicit matrix inversion by constructing a better-conditioned variational lower bound, from which they derive natural gradient updates expressed solely in terms of matrix multiplications. To enhance convergence and numerical stability, practical heuristic strategies are introduced. The resulting method seamlessly integrates into existing SVGP frameworks while avoiding any explicit matrix inversion. Empirical evaluations demonstrate that it achieves performance comparable to standard SVGP on both regression and classification tasks, yet enables significantly accelerated training when deployed on modern hardware architectures optimized for parallel linear algebra operations.
📝 Abstract
Gaussian processes (GPs) offer appealing properties but are costly to train at scale. Sparse variational GP (SVGP) approximations reduce cost yet still rely on Cholesky decompositions of kernel matrices, ill-suited to low-precision, massively parallel hardware. While one can construct valid variational bounds that rely only on matrix multiplications (matmuls) via an auxiliary matrix parameter, optimising them with off-the-shelf first-order methods is challenging. We make the inverse-free approach practical by proposing a better-conditioned bound and deriving a matmul-only natural-gradient update for the auxiliary parameter, markedly improving stability and convergence. We further provide simple heuristics, such as step-size schedules and stopping criteria, that make the overall optimisation routine fit seamlessly into existing workflows. Across regression and classification benchmarks, we demonstrate that our method 1) serves as a drop-in replacement in SVGP-based models (e.g., deep GPs), 2) recovers similar performance to traditional methods, and 3) can be faster than baselines when well tuned.