🤖 AI Summary
Multivariate Gaussian processes (GPs) face severe computational bottlenecks in high-dimensional spatiotemporal modeling. To address this, we propose a scalable Bayesian framework for high-dimensional spatiotemporal multivariate GP inference. Our method employs: (i) sparse precision matrix modeling to reduce parameter complexity; (ii) GPU-accelerated block-dense linear algebra operations to enhance numerical efficiency; and (iii) a three-tier distributed-memory parallel architecture enabling synergistic strong and weak scaling. Evaluated on 496 NVIDIA GH200 Superchips, the framework achieves a 1000× speedup and supports an 8× increase in parameter count. Applied to 48 days of air pollution data across Northern Italy, it significantly improves spatial resolution. Compared to state-of-the-art approaches, its weak scaling performance improves by two orders of magnitude. This work delivers the first solution that simultaneously ensures statistical rigor and engineering scalability for large-scale spatiotemporal dependency inference.
📝 Abstract
Multivariate Gaussian processes (GPs) offer a powerful probabilistic framework to represent complex interdependent phenomena. They pose, however, significant computational challenges in high-dimensional settings, which frequently arise in spatial-temporal applications. We present DALIA, a highly scalable framework for performing Bayesian inference tasks on spatio-temporal multivariate GPs, based on the methodology of integrated nested Laplace approximations. Our approach relies on a sparse inverse covariance matrix formulation of the GP, puts forward a GPU-accelerated block-dense approach, and introduces a hierarchical, triple-layer, distributed memory parallel scheme. We showcase weak scaling performance surpassing the state-of-the-art by two orders of magnitude on a model whose parameter space is 8$ imes$ larger and measure strong scaling speedups of three orders of magnitude when running on 496 GH200 superchips on the Alps supercomputer. Applying DALIA to air pollution data from northern Italy over 48 days, we showcase refined spatial resolutions over the aggregated pollutant measurements.