Towards the Democratization and Standardization of Dynamic Resources with MPI Spawning

📅 2026-04-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

221K/year
🤖 AI Summary
This work addresses the challenges of inconsistent interfaces, limited flexibility, and poor integration with resource management systems in dynamic resource management for high-performance computing. To overcome these issues, the authors propose a modular Dynamic Resource Management (DMR) framework that abstracts underlying heterogeneity through a unified API. The framework integrates the Proteo reconfiguration engine with MPI Spawn to support diverse runtime reconfiguration strategies, thereby eliminating the need for full-process restarts while maintaining compatibility with mainstream resource managers. Experimental evaluation on the MPDATA solver demonstrates that the proposed approach significantly enhances both programming productivity and runtime performance, enabling efficient and scalable dynamic resource scheduling and reconfiguration.
📝 Abstract
This paper presents an efficient tool for managing dynamic resources in production high-performance computing (HPC) settings, focusing on flexibility, adaptability, and user-friendliness. We introduce a unified dynamic resource management application programming interface (API) that supports a wide range of HPC applications, allowing seamless integration without direct interaction with Dynamic Management of Resources (DMR). The DMR framework, evolved from the DMRlib structure, now supports various dynamic resource managers and includes the Proteo reconfiguration engine to enhance malleability strategies. This integration addresses previous limitations by allowing diverse reconfiguration methods without respawning all processes or lacking RMS support. The paper also showcases the solution's performance and coding productivity with the MPDATA (Multidimensional Positive Definite Advection Transport Algorithm) application. Key contributions include an enhanced modular DMR framework supporting different reconfiguration managers, upgraded DMRlib with the Proteo reconfiguration engine, offering extensive reconfiguration strategies, and a malleable version of the MPDATA solver.
Problem

Research questions and friction points this paper is trying to address.

Dynamic Resource Management
High-Performance Computing
Malleability
MPI Spawning
Reconfiguration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Resource Management
Malleability
MPI Spawning
Proteo Reconfiguration Engine
DMRlib
🔎 Similar Papers
No similar papers found.