🤖 AI Summary
To address low reusability of HPC benchmarks, poor cross-platform portability, and inefficient resource validation, this paper proposes the “benchmark carpentry” paradigm—a lightweight, reusable experimental execution framework. Methodologically, it integrates Cloudmesh’s experiment executor with HPE SmartSim, incorporating standardized workflow templates, AI/ML–simulation coupling mechanisms, and a unified experimental management interface. Its key contribution is the first application of craftsmanship principles to benchmarking process design, enabling automated, cross-domain and cross-architecture benchmark deployment and capability assessment. Evaluated on representative scientific computing workloads—including cloud masking analysis, seismic forecasting, and CFD surrogate modeling—the framework achieves ≥92% workflow reproducibility and reduces average deployment time by 68%, significantly improving resource configuration efficiency. It establishes a scalable, community-driven paradigm for HPC capability validation.
📝 Abstract
A key hurdle is demonstrating compute resource capability with limited benchmarks. We propose workflow templates as a solution, offering adaptable designs for specific scientific applications. Our paper identifies common usage patterns for these templates, drawn from decades of HPC experience, including recent work with the MLCommons Science working group.
We found that focusing on simple experiment management tools within the broader computational workflow improves adaptability, especially in education. This concept, which we term benchmark carpentry, is validated by two independent tools: Cloudmesh's Experiment Executor and Hewlett Packard Enterprise's SmartSim. Both frameworks, with significant functional overlap, have been tested across various scientific applications, including conduction cloudmask, earthquake prediction, simulation-AI/ML interactions, and the development of computational fluid dynamics surrogates.