🤖 AI Summary
Hypersonic reactive flows suffer severe computational bottlenecks due to multiscale coupling—particularly stiffness from chemical kinetics dictating restrictive time steps—while existing GPU-based combustion solvers exhibit suboptimal memory access efficiency, load imbalance, and inadequate handling of localized reactions. This work develops a high-performance compressible combustion solver for multi-GPU architectures within the AMReX framework. We introduce column-major storage to optimize global memory access; develop a batched sparse chemical kinetics integration strategy, extended for the first time to adaptive multigrid environments; and implement mesh-adaptivity-driven dynamic load balancing across GPUs. Guided by the Roofline model, our implementation achieves near-ideal weak scaling on 1–96 NVIDIA H100 GPUs, delivering 2–5× speedup over baseline solvers. Computational intensities of convection and chemistry kernels improve by approximately 10× and 4×, respectively.
📝 Abstract
High-speed chemically active flows present significant computational challenges due to their disparate space and time scales, where stiff chemistry often dominates simulation time. While modern supercomputing scientific codes achieve exascale performance by leveraging graphics processing units (GPUs), existing GPU-based compressible combustion solvers face critical limitations in memory management, load balancing, and handling the highly localized nature of chemical reactions. To this end, we present a high-performance compressible reacting flow solver built on the AMReX framework and optimized for multi-GPU settings. Our approach addresses three GPU performance bottlenecks: memory access patterns through column-major storage optimization, computational workload variability via a bulk-sparse integration strategy for chemical kinetics, and multi-GPU load distribution for adaptive mesh refinement applications. The solver adapts existing matrix-based chemical kinetics formulations to multigrid contexts. Using representative combustion applications including hydrogen-air detonations and jet in supersonic crossflow configurations, we demonstrate $2-5 imes$ performance improvements over initial GPU implementations with near-ideal weak scaling across $1-96$ NVIDIA H100 GPUs. Roofline analysis reveals substantial improvements in arithmetic intensity for both convection ($sim 10 imes$) and chemistry ($sim 4 imes$) routines, confirming efficient utilization of GPU memory bandwidth and computational resources.