🤖 AI Summary
Traditional HPC architectures are vertically integrated and inflexible, struggling to support multidisciplinary, heterogeneous computing workloads. Method: This project introduces Alps—a resource-decoupled HPC infrastructure—featuring the novel software-defined cluster (vCluster) paradigm. vCluster achieves hierarchical decoupling of infrastructure, services, and user environments, enabling seamless integration of HPC and cloud computing paradigms. Leveraging Slingshot high-speed interconnects, Alps integrates heterogeneous CPU/GPU compute resources with modular storage to form a composable, customizable research platform. Contribution/Results: Alps has been successfully deployed for domain-specific applications—including numerical weather prediction and AI—demonstrating significantly improved resource utilization and markedly enhanced adaptability to diverse scientific workloads. It establishes a scalable, highly flexible new paradigm for scientific computing infrastructure.
📝 Abstract
The Swiss National Supercomputing Centre (CSCS) has a long-standing tradition of delivering top-tier high-performance computing systems, exemplified by the Piz Daint supercomputer. However, the increasing diversity of scientific needs has exposed limitations in traditional vertically integrated HPC architectures, which often lack flexibility and composability. To address these challenges, CSCS developed Alps, a next-generation HPC infrastructure designed with a transformative principle: resources operate as independent endpoints within a high-speed network. This architecture enables the creation of independent tenant-specific and platform-specific services, tailored to diverse scientific requirements.
Alps incorporates heterogeneous hardware, including CPUs and GPUs, interconnected by a high-performance Slingshot network, and offers a modular storage system. A key innovation is the versatile software-defined cluster (vCluster) technology, which bridges cloud and HPC paradigms. By abstracting infrastructure, service management, and user environments into distinct layers, vClusters allow for customized platforms that support diverse workloads. Current platforms on Alps serve various scientific domains, including numerical weather prediction, and AI research.