Extending DD-$alpha$AMG on heterogeneous machines

📅 2024-07-10

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work addresses the key challenge of enabling efficient, cross-architecture execution of the DD-αAMG multigrid solver for lattice QCD simulations on heterogeneous exascale supercomputers (e.g., ORISE). Method: We present the first full port of DD-αAMG to the HIP platform, unifying acceleration across both NVIDIA and AMD GPUs; introduce an improved Richardson smoother that achieves superior convergence–speed trade-offs—outperforming GCR significantly and lagging SAP by only ~10%; and extend the odd-even preconditioned GMRES/Richardson hybrid iteration, applying advanced SIMD vectorization and computational restructuring to coarse-grid operations. Contribution/Results: Experiments on ORISE demonstrate efficient heterogeneous acceleration, with substantial performance gains in critical coarse-grained operations. The proposed framework establishes a scalable, cross-platform AMG solver paradigm tailored for large-scale lattice QCD computations.

Technology Category

Application Category

📝 Abstract

Multigrid solvers are the standard in modern scientific computing simulations. Domain Decomposition Aggregation-Based Algebraic Multigrid, also known as the DD-$alpha$AMG solver, is a successful realization of an algebraic multigrid solver for lattice quantum chromodynamics. Its CPU implementation has made it possible to construct, for some particular discretizations, simulations otherwise computationally unfeasible, and furthermore it has motivated the development and improvement of other algebraic multigrid solvers in the area. From an existing version of DD-$alpha$AMG already partially ported via CUDA to run some finest-level operations of the multigrid solver on Nvidia GPUs, we translate the CUDA code here by using HIP to run on the ORISE supercomputer. We moreover extend the smoothers available in DD-$alpha$AMG, paying particular attention to Richardson smoothing, which in our numerical experiments has led to a multigrid solver faster than smoothing with GCR and only 10% slower compared to SAP smoothing. Then we port the odd-even-preconditioned versions of GMRES and Richardson via CUDA. Finally, we extend some computationally intensive coarse-grid operations via advanced vectorization.

Problem

Research questions and friction points this paper is trying to address.

Extend DD-αAMG for heterogeneous machines

Improve smoothers like Richardson in DD-αAMG

Port coarse-grid operations via advanced vectorization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extend DD-αAMG to heterogeneous machines via HIP

Enhance smoothers focusing on Richardson smoothing

Port GMRES and Richardson with odd-even preconditioning

🔎 Similar Papers

Efficient Heterogeneous Large Language Model Decoding with Model-Attention Disaggregation

2024-05-03Citations: 8

💼 Related Jobs

Performance Engineer, GPU

Anthropic

$280,000—$850,000 USD

San Francisco, CA | New York City, NY | Seattle, WA / San Francisco, CA, San Francisco, California, United States

Authors to Follow