AI Factories: It's time to rethink the Cloud-HPC divide

📅 2025-09-16

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

High-performance computing (HPC) systems deliver exceptional computational power but lack cloud-native usability, accessibility, and service-oriented capabilities—hindering their deployment for public-facing AI inference and agent-based applications. Conversely, cloud-native technologies (e.g., Kubernetes, object storage) widely adopted by AI developers are poorly compatible with traditional HPC architectures. To bridge this gap, we propose a dual-stack AI factory architecture that synergistically integrates HPC and cloud-native paradigms, enabling the first deep convergence of serverless HPC and high-performance cloud computing. Our approach unifies orchestration across HPC clusters, Kubernetes, hardware accelerators, and cloud-native service frameworks—delivering sovereign AI infrastructure that combines peak performance with out-of-the-box usability. Deployed as the foundational blueprint for the EuroHPC AI Factory, this architecture significantly improves resource utilization and AI service accessibility, while scaling effectively to support large-scale inference and intelligent agent deployment.

Technology Category

Application Category

📝 Abstract

The strategic importance of artificial intelligence is driving a global push toward Sovereign AI initiatives. Nationwide governments are increasingly developing dedicated infrastructures, called AI Factories (AIF), to achieve technological autonomy and secure the resources necessary to sustain robust local digital ecosystems. In Europe, the EuroHPC Joint Undertaking is investing hundreds of millions of euros into several AI Factories, built atop existing high-performance computing (HPC) supercomputers. However, while HPC systems excel in raw performance, they are not inherently designed for usability, accessibility, or serving as public-facing platforms for AI services such as inference or agentic applications. In contrast, AI practitioners are accustomed to cloud-native technologies like Kubernetes and object storage, tools that are often difficult to integrate within traditional HPC environments. This article advocates for a dual-stack approach within supercomputers: integrating both HPC and cloud-native technologies. Our goal is to bridge the divide between HPC and cloud computing by combining high performance and hardware acceleration with ease of use and service-oriented front-ends. This convergence allows each paradigm to amplify the other. To this end, we will study the cloud challenges of HPC (Serverless HPC) and the HPC challenges of cloud technologies (High-performance Cloud).

Problem

Research questions and friction points this paper is trying to address.

Bridging HPC and cloud-native technologies for AI

Enhancing usability of HPC systems for AI services

Integrating cloud accessibility with supercomputing performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-stack approach combining HPC and cloud

Integrating Kubernetes and object storage technologies

Bridging high performance with service-oriented frontends

🔎 Similar Papers

The rising costs of training frontier AI models