🤖 AI Summary
To address the challenges of hyperparameter optimization, high communication overhead, and weak privacy preservation in Gaussian processes (GPs) for large-scale, high-dimensional data, this paper proposes the Grid Spectral Mixture Product (GSMP) kernel and the Sparse-aware, Localized Iterative Minimization framework with KL-divergence regularization (SLIM-KL). The GSMP kernel employs structured spectral modeling to drastically reduce hyperparameter dimensionality and induce sparsity in the solution. SLIM-KL is the first framework to integrate quantized alternating direction method of multipliers (ADMM) with distributed successive convex approximation (DSCA), enabling efficient collaborative optimization while strictly preserving data locality. We provide theoretical convergence guarantees for the algorithm. Experiments demonstrate substantial improvements: prediction accuracy increases significantly, communication cost decreases by over 40%, and hyperparameter optimization scales to ten-thousand-node federated settings—achieving a balanced trade-off among model performance, computational efficiency, and privacy protection.
📝 Abstract
Gaussian processes (GPs) stand as crucial tools in machine learning and signal processing, with their effectiveness hinging on kernel design and hyper-parameter optimization. This paper presents a novel GP linear multiple kernel (LMK) and a generic sparsity-aware distributed learning framework to optimize the hyper-parameters. The newly proposed grid spectral mixture product (GSMP) kernel is tailored for multi-dimensional data, effectively reducing the number of hyper-parameters while maintaining good approximation capability. We further demonstrate that the associated hyper-parameter optimization of this kernel yields sparse solutions. To exploit the inherent sparsity of the solutions, we introduce the Sparse LInear Multiple Kernel Learning (SLIM-KL) framework. The framework incorporates a quantized alternating direction method of multipliers (ADMM) scheme for collaborative learning among multiple agents, where the local optimization problem is solved using a distributed successive convex approximation (DSCA) algorithm. SLIM-KL effectively manages large-scale hyper-parameter optimization for the proposed kernel, simultaneously ensuring data privacy and minimizing communication costs. Theoretical analysis establishes convergence guarantees for the learning framework, while experiments on diverse datasets demonstrate the superior prediction performance and efficiency of our proposed methods.