🤖 AI Summary
Existing AllReduce implementations in MPICH must semantically match the canonical “Reduce-then-Broadcast” specification, yet their diverse, highly concurrent designs impede rigorous correctness assurance.
Method: The authors extract and refactor three representative AllReduce algorithms directly from MPICH source code into standalone, analyzable models; they then perform end-to-end formal verification—using the concurrent intermediate verification language CIVL—on two of them (Bruck and Recursive Doubling).
Contribution: This work presents the first machine-checkable correctness proofs for core MPICH AllReduce algorithms, demonstrating the feasibility of applying formal methods to verify large-scale MPI communication primitives. It establishes a methodological foundation and practical blueprint for the trustworthy evolution of high-performance computing libraries. (132 words)
📝 Abstract
We describe a challenge problem for verification based on the MPICH implementation of MPI. The MPICH implementation includes several algorithms for allreduce, all of which should be functionally equivalent to reduce followed by broadcast. We created standalone versions of three algorithms and verified two of them using CIVL.