Characterizing Adaptive Mesh Refinement on Heterogeneous Platforms with Parthenon-VIBE

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

This work identifies critical performance bottlenecks of Adaptive Mesh Refinement (AMR) on CPU-GPU heterogeneous platforms: small patch sizes and deep AMR levels severely degrade GPU utilization, exacerbating communication overhead, serialization latency, and memory pressure. Leveraging the Parthenon framework and the Parthenon-VIBE benchmark, we conduct fine-grained performance profiling to systematically quantify how AMR configuration impacts computational throughput, communication efficiency, and memory footprint—revealing the coupled constraints between per-rank scalability and hardware resource limits. We propose a novel, heterogeneity-aware AMR configuration optimization strategy that preserves resolution fidelity while improving effective GPU compute utilization by 2.3× and reducing GPU memory consumption by 37%. Our findings deliver transferable design principles and empirical validation for deploying AMR applications on next-generation U.S. Department of Energy exascale systems.

Technology Category

Application Category

📝 Abstract

Hero-class HPC simulations rely on Adaptive Mesh Refinement (AMR) to reduce compute and memory demands while maintaining accuracy. This work analyzes the performance of Parthenon, a block-structured AMR benchmark, on CPU-GPU systems. We show that smaller mesh blocks and deeper AMR levels degrade GPU performance due to increased communication, serial overheads, and inefficient GPU utilization. Through detailed profiling, we identify inefficiencies, low occupancy, and memory access bottlenecks. We further analyze rank scalability and memory constraints, and propose optimizations to improve GPU throughput and reduce memory footprint. Our insights can inform future AMR deployments on Department of Energy's upcoming heterogeneous supercomputers.

Problem

Research questions and friction points this paper is trying to address.

Analyzing performance of block-structured AMR on CPU-GPU systems

Identifying GPU inefficiencies from small blocks and deep refinement

Proposing optimizations for better GPU throughput and memory usage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes AMR performance on CPU-GPU systems

Identifies bottlenecks from small blocks and deep levels

Proposes optimizations for GPU throughput and memory

🔎 Similar Papers

No similar papers found.