An efficient multi-GPU implementation for the Discontinuous Galerkin ocean model SLIM

📅 2026-05-15

📈 Citations: 0

✨ Influential: 0

career value

248K/year

🤖 AI Summary

This study addresses the high computational cost of unstructured-mesh ocean models, particularly in high-resolution coastal simulations. The authors propose and implement SLIM, a fully three-dimensional discontinuous Galerkin (DG) ocean model designed for both single-node and multi-GPU systems, achieving efficient execution across heterogeneous GPU architectures (NVIDIA and AMD) for the first time. By integrating a matrix-free vertical solver, optimized memory layout, kernel-level parallelism, and a weak-scaling communication strategy, SLIM demonstrates substantial performance gains: a single A100 GPU delivers performance equivalent to approximately 1,500 CPU cores; a 4×A100 node achieves a ~50× speedup over a 128-core CPU system; and weak scaling efficiency remains robust up to 1,024 GPUs. The model successfully enables a high-resolution simulation of the Great Barrier Reef at five times the resolution of existing models.

📝 Abstract

Unstructured-mesh ocean models are increasingly used for coastal applications due to their ability to represent complex geometries and apply local grid refinement where needed. However, their broader use has been hindered by their high computational cost, particularly for models based on the Discontinuous Galerkin finite element (DG-FE) method, which involves significantly more degrees of freedom than traditional finite volume or continuous finite element approaches. The rapid emergence of GPU-based high-performance computing architectures now offers a pathway to address this limitation, as DG-FE formulations are inherently well suited to massively parallel, element-wise computations. Here, we present a full 3D DG-FE ocean model implementation optimized for both single- and multi-GPU systems, with support for both NVIDIA and AMD architectures. We detail the computational strategies employed to achieve high performance, including memory layout optimization, kernel-level parallelization, and matrix-free solvers for key vertical processes. Benchmark results demonstrate that a single HPC-grade GPU (e.g. NVIDIA A100) delivers performance equivalent to approximately 1500 CPU cores, while replacing a 128-core CPU node with a 4xA100 GPU node yields a speedup of around 50x. Weak-scaling efficiency is maintained up to 1024 GPUs. We further demonstrate the model's capabilities on a real-world application in the Great Barrier Reef, achieving a spatial resolution five times finer than the most accurate existing model while maintaining a physical-to-numerical time ratio of 100. These results highlight how GPU-accelerated DG-FE methods can dramatically advance the capabilities of unstructured-mesh ocean modeling, enabling ultra-high-resolution coastal simulations that were previously infeasible.

Problem

Research questions and friction points this paper is trying to address.

Discontinuous Galerkin

ocean modeling

high computational cost

unstructured-mesh

GPU acceleration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Discontinuous Galerkin

multi-GPU acceleration

unstructured-mesh ocean modeling