🤖 AI Summary
This work addresses the reproducibility challenges posed by the rapid evolution of large models and high-performance computing systems, where existing benchmarks lack sustainable and automated evaluation mechanisms. To bridge this gap, the authors propose a user-agnostic continuous benchmarking framework that integrates principles from software engineering—particularly continuous integration—to establish an automated pipeline. This pipeline seamlessly combines systematic workflows with community-driven collaboration, delivering a reproducible and scalable benchmarking infrastructure for artificial intelligence and neuroscience research. The framework significantly enhances the sustainability, transparency, and collaborative efficiency of scientific evaluation in these fields.
📝 Abstract
Drawing on ideas from continuous integration, we present concepts of an automated benchmarking pipeline for high performance applications. Customization and collaboration have been key design goals owing to the requirements of research-software development as a continuous community effort. We have extended our previous conceptual work on systematic benchmarking workflows with the functionality of user-agnostic operations as well as continuous benchmarking. This fosters reproducibility and re-use of benchmarking results to ensure sustainable technological progress. We provide software-engineering solutions to keep pace with the rapid evolution of both large-scale models and high-performance computing systems with a view towards the scientific domains of neuroscience and artificial intelligence.