Alps, a versatile research infrastructure

📅 2025-07-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional HPC architectures are vertically integrated and inflexible, struggling to support multidisciplinary, heterogeneous computing workloads. Method: This project introduces Alps—a resource-decoupled HPC infrastructure—featuring the novel software-defined cluster (vCluster) paradigm. vCluster achieves hierarchical decoupling of infrastructure, services, and user environments, enabling seamless integration of HPC and cloud computing paradigms. Leveraging Slingshot high-speed interconnects, Alps integrates heterogeneous CPU/GPU compute resources with modular storage to form a composable, customizable research platform. Contribution/Results: Alps has been successfully deployed for domain-specific applications—including numerical weather prediction and AI—demonstrating significantly improved resource utilization and markedly enhanced adaptability to diverse scientific workloads. It establishes a scalable, highly flexible new paradigm for scientific computing infrastructure.

Technology Category

Application Category

📝 Abstract
The Swiss National Supercomputing Centre (CSCS) has a long-standing tradition of delivering top-tier high-performance computing systems, exemplified by the Piz Daint supercomputer. However, the increasing diversity of scientific needs has exposed limitations in traditional vertically integrated HPC architectures, which often lack flexibility and composability. To address these challenges, CSCS developed Alps, a next-generation HPC infrastructure designed with a transformative principle: resources operate as independent endpoints within a high-speed network. This architecture enables the creation of independent tenant-specific and platform-specific services, tailored to diverse scientific requirements. Alps incorporates heterogeneous hardware, including CPUs and GPUs, interconnected by a high-performance Slingshot network, and offers a modular storage system. A key innovation is the versatile software-defined cluster (vCluster) technology, which bridges cloud and HPC paradigms. By abstracting infrastructure, service management, and user environments into distinct layers, vClusters allow for customized platforms that support diverse workloads. Current platforms on Alps serve various scientific domains, including numerical weather prediction, and AI research.
Problem

Research questions and friction points this paper is trying to address.

Addressing limitations of traditional HPC architectures
Enabling flexible composable scientific computing resources
Supporting diverse workloads across scientific domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Independent endpoints in high-speed network
Heterogeneous hardware with Slingshot network
Software-defined vCluster technology
🔎 Similar Papers
No similar papers found.
M
Maxime Martinasso
ETH Zurich, Swiss National Supercomputing Centre (CSCS), Lugano, Switzerland
Mark Klein
Mark Klein
MIT Center for Collective Intelligence
collective intelligenceartificial intelligencemulti-agent systemssustainability
T
Thomas C. Schulthess
ETH Zurich, Swiss National Supercomputing Centre (CSCS), Lugano, Switzerland