Accelerating High-Order Finite Element Simulations at Extreme Scale with FP64 Tensor Cores

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

This work addresses the efficiency bottleneck in high-order finite element simulations, which are computationally intensive and challenging to accelerate effectively on GPUs without compromising performance or energy efficiency. For the first time, the study directly leverages FP64 Tensor Cores to accelerate large-scale finite element computations, integrating kernel fusion techniques with the high-performance MFEM library. Significant optimizations of critical kernels are demonstrated on NVIDIA’s Grace Hopper GH200 and Grace Blackwell GB200 architectures. At scales approaching ten thousand GPUs, the approach achieves up to 2× performance speedup and 83% improvement in energy efficiency, with near-ideal weak scaling and 90% strong scaling efficiency. These advances successfully enabled a real-time tsunami prediction application that won the 2025 Gordon Bell Prize.

Technology Category

Application Category

📝 Abstract

Finite element simulations play a critical role in a wide range of applications, from automotive design to tsunami modeling and computational electromagnetics. Performing these simulations efficiently at the high resolutions needed for practical applications and scientific insights necessitates the use of high-order methods and large-scale supercomputing. While much progress has been made in porting finite element codes to GPU systems in recent years, additional improvements in the efficiency and computational speed of GPU-accelerated high-order finite element simulations are in constant demand. In this paper, we demonstrate that the FP64 tensor cores on NVIDIA GPUs can be used to further accelerate such simulations, achieving significant speedups in key kernels of MFEM, a scalable open-source finite element library widely used in HPC applications. By integrating FP64 tensor cores with kernel fusion optimizations, we were able to achieve up to 2$\times$ performance gains and up to 83% energy efficiency gains on NVIDIA's Grace Hopper GH200 and Grace Blackwell GB200 architectures. To the best of our knowledge, this is the first time that FP64 tensor cores have been directly programmed to accelerate large-scale finite element scientific computing applications. We demonstrate the performance of the optimized kernels at exascale by showing near-perfect weak scaling efficiency and 90% strong scaling efficiency across nearly 10,000 GPUs on the Alps system. The new algorithms and MFEM enhancements directly benefit complex production codes, including the 2025 Gordon Bell Prize-winning application for real-time tsunami forecasting.

Problem

Research questions and friction points this paper is trying to address.

high-order finite element

extreme-scale simulation

computational efficiency

FP64 tensor cores

scientific computing

Innovation

Methods, ideas, or system contributions that make the work stand out.

FP64 Tensor Cores

high-order finite element

kernel fusion