🤖 AI Summary
To address low resource utilization, inflexible scheduling, and lack of elasticity for MPI jobs under dynamic workloads in HPC clusters, this paper proposes a runtime optimization mechanism that deeply integrates MPI process malleability into the resource scheduling layer. We extend MPICH with a lightweight process remapping protocol, co-design a load-aware scheduler and a communication-topology-preserving migration algorithm, and natively integrate the system into Slurm. This enables fine-grained, low-overhead process scaling and cross-node migration. Our approach breaks the constraints of traditional static MPI models. Evaluation on a production HPC cluster demonstrates an average 37% improvement in resource utilization, a 29% reduction in job completion time, and communication interruption durations under 50 ms—significantly outperforming both static allocation and state-of-the-art elastic MPI solutions.