π€ AI Summary
Scientific computing in Python is constrained by single-node parallelism, hindering efficient migration of NumPy-based prototypes to multi-GPU supercomputing clusters; meanwhile, traditional HPC abstractions lack resource elasticity and usability. To address this, we propose CharmTylesβa Charm++-based adaptive stencil computation framework supporting multi-node GPU execution. It introduces the first stencil abstraction that unifies NumPy-like syntax with runtime cross-node dynamic reconfiguration. Leveraging distributed memory management, automatic data partitioning, and optimized asynchronous communication, CharmTyles achieves elastic resource scaling with sub-200 ms reconfiguration latency. Evaluation shows significant performance improvements over both domain-specific stencil DSLs and general-purpose NumPy alternatives. CharmTyles effectively bridges the critical gap between rapid scientific prototyping and scalable deployment on exascale-class systems.
π Abstract
The scientific computing ecosystem in Python is largely confined to single-node parallelism, creating a gap between high-level prototyping in NumPy and high-performance execution on modern supercomputers. The increasing prevalence of hardware accelerators and the need for energy efficiency have made resource adaptivity a critical requirement, yet traditional HPC abstractions remain rigid. To address these challenges, we present an adaptive, distributed abstraction for stencil computations on multi-node GPUs. This abstraction is built using CharmTyles, a framework based on the adaptive Charm++ runtime, and features a familiar NumPy-like syntax to minimize the porting effort from prototype to production code. We showcase the resource elasticity of our abstraction by dynamically rescaling a running application across a different number of nodes and present a performance analysis of the associated overheads. Furthermore, we demonstrate that our abstraction achieves significant performance improvements over both a specialized, high-performance stencil DSL and a generalized NumPy replacement.