Adaptive Multidimensional Quadrature on Multi-GPU Systems

📅 2025-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address load imbalance and poor convergence robustness in high-dimensional adaptive numerical integration on multi-GPU systems, this paper proposes a decentralized distributed algorithm. The method employs hierarchical domain decomposition and local error-driven recursive subdivision, enabling independent adaptive partitioning on each GPU. A cyclic polling-based dynamic load redistribution mechanism is designed, leveraging non-blocking CUDA-aware MPI for low-overhead inter-GPU communication—without requiring global synchronization or centralized scheduling. Experiments on typical 10–50 dimensional integral problems demonstrate that the proposed approach achieves 1.8–3.2× higher computational efficiency compared to state-of-the-art GPU-accelerated integration libraries (e.g., Cuba-GPU, GpuQUAD). Moreover, it exhibits significantly enhanced robustness against degradation in integrand regularity and variations in target accuracy.

Technology Category

Application Category

📝 Abstract
We introduce a distributed adaptive quadrature method that formulates multidimensional integration as a hierarchical domain decomposition problem on multi-GPU architectures. The integration domain is recursively partitioned into subdomains whose refinement is guided by local error estimators. Each subdomain evolves independently on a GPU, which exposes a significant load imbalance as the adaptive process progresses. To address this challenge, we introduce a decentralised load redistribution schemes based on a cyclic round-robin policy. This strategy dynamically rebalance subdomains across devices through non-blocking, CUDA-aware MPI communication that overlaps with computation. The proposed strategy has two main advantages compared to a state-of-the-art GPU-tailored package: higher efficiency in high dimensions; and improved robustness w.r.t the integrand regularity and the target accuracy.
Problem

Research questions and friction points this paper is trying to address.

Develops adaptive quadrature for multidimensional integration on multi-GPU systems
Addresses load imbalance via decentralized redistribution during domain decomposition
Improves efficiency and robustness in high-dimensional integration problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical domain decomposition for multidimensional integration
Decentralized load balancing using cyclic round-robin policy
Non-blocking CUDA-aware MPI communication overlapping computation
🔎 Similar Papers
No similar papers found.
M
Melanie Tonarelli
Euler Institute, Faculty of Informatics, Università della Svizzera Italiana, Lugano, Switzerland, and Politecnico di Milano, Milan, Italy
S
Simone Riva
Istituto ricerche solari Aldo e Cele Daccò (IRSOL), Faculty of Informatics, Università della Svizzera Italiana, Locarno, Switzerland
P
Pietro Benedusi
Istituto ricerche solari Aldo e Cele Daccò (IRSOL), Faculty of Informatics, Università della Svizzera Italiana, Locarno, Switzerland
Fabrizio Ferrandi
Fabrizio Ferrandi
Politecnico di Milano, Milan, Italy
Rolf Krause
Rolf Krause
Full Professor, KAUST
Numerical Solution of PDEsMachine LearningMultigrid/Domain DecompositionContact Problems