Scaled Block Vecchia Approximation for High-Dimensional Gaussian Process Emulation on GPUs

📅 2025-04-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor scalability of high-dimensional Gaussian processes (GPs) on GPU clusters when applied to million-scale scientific simulation data—due to prohibitive computational and memory complexity—this paper introduces the first distributed, GPU-accelerated Scaled Block Vecchia (SBV) algorithm. SBV integrates anisotropic input scaling with blockwise low-rank approximations, enables inter-node parallelism via MPI, and leverages the MAGMA library to optimize batched linear algebra operations on GPUs, supporting heterogeneous A100/GH200 clusters. Evaluated on a 50-million-point respiratory disease model, SBV scales to 320 million data points; with 64 GPUs, it achieves near-linear speedup. Compared to exact GP solvers, SBV significantly reduces both computational cost and energy consumption. This work marks the first practical deployment of Vecchia-type GP methods for ultra-large-scale scientific surrogate modeling.

Technology Category

Application Category

📝 Abstract
Emulating computationally intensive scientific simulations is essential to enable uncertainty quantification, optimization, and decision-making at scale. Gaussian Processes (GPs) offer a flexible and data-efficient foundation for statistical emulation, but their poor scalability limits applicability to large datasets. We introduce the Scaled Block Vecchia (SBV) algorithm for distributed GPU-based systems. SBV integrates the Scaled Vecchia approach for anisotropic input scaling with the Block Vecchia (BV) method to reduce computational and memory complexity while leveraging GPU acceleration techniques for efficient linear algebra operations. To the best of our knowledge, this is the first distributed implementation of any Vecchia-based GP variant. Our implementation employs MPI for inter-node parallelism and the MAGMA library for GPU-accelerated batched matrix computations. We demonstrate the scalability and efficiency of the proposed algorithm through experiments on synthetic and real-world workloads, including a 50M point simulation from a respiratory disease model. SBV achieves near-linear scalability on up to 64 A100 and GH200 GPUs, handles 320M points, and reduces energy use relative to exact GP solvers, establishing SBV as a scalable and energy-efficient framework for emulating large-scale scientific models on GPU-based distributed systems.
Problem

Research questions and friction points this paper is trying to address.

Emulate large-scale scientific simulations efficiently
Improve Gaussian Process scalability for big datasets
Enable distributed GPU-based GP emulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scaled Block Vecchia algorithm for GPU systems
MPI and MAGMA for distributed GPU acceleration
Handles 320M points with near-linear scalability
🔎 Similar Papers
No similar papers found.