🤖 AI Summary
Gaussian processes struggle to scale to large datasets due to their high computational complexity, which particularly limits their applicability in Bayesian deep learning. This work proposes a sparse inducing kernel approximation based on binary ordered template bases that achieves logarithmic computational complexity solely with respect to the number of inducing points, substantially reducing computational overhead. The method yields a compact yet expressive kernel representation that naturally supports deep feature learning and high-dimensional representations, and seamlessly integrates into Bayesian neural networks with sparse activation. Combined with tensorized GPU acceleration, the approach enables efficient training and inference on both vision and Transformer-based language benchmarks, significantly accelerating computation while preserving predictive accuracy.
📝 Abstract
Gaussian processes (GPs) provide a principled Bayesian framework for uncertainty estimation, but their computational complexity severely limits scalability to large datasets. We propose SIKA-GP, which accelerates GP inference using sparse inducing kernel approximations based on a dyadic ordered template basis, incurring only ${O}(\log M)$ complexity dependence on the number of inducing points. Our approach constructs compact and expressive kernel representations from sparsely activated bases, enabling efficient tensorized GPU computation and seamless integration with modern large-scale models. SIKA-GP can be naturally embedded into Bayesian neural networks (BNNs) with sparse activations, yielding significant speedups in both training and inference without sacrificing predictive performance. The method naturally extends to deep feature learning, addressing the scalability challenges introduced by deep architectures and high-dimensional feature representations. Empirical results on vision and transformer-based language benchmarks demonstrate that our approach consistently delivers fast and accurate GP models, providing a principled path toward scalable kernel learning.