Combining Serverless and High-Performance Computing Paradigms to support ML Data-Intensive Applications

📅 2025-11-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high communication overhead and poor scalability of serverless architectures in machine learning–intensive, data-heavy workloads, this paper proposes a high-performance computing (HPC)-inspired serverless framework. Methodologically, it introduces a NAT-traversal direct communication mechanism based on TCP hole punching, implements a lightweight serverless communicator, and integrates the Cylon distributed dataframe library with an FMI-inspired heuristic communication scheduling model. This design enables decentralized, low-latency, high-throughput distributed data processing within cloud-native environments—without centralized coordination. Experimental results demonstrate that the framework achieves over 99% end-to-end performance improvement compared to conventional serverless approaches. Its strong scaling efficiency closely matches that of EC2 instances and dedicated HPC clusters. Notably, this work is the first to achieve near-HPC communication efficiency and scalability in a serverless setting.

Technology Category

Application Category

📝 Abstract
Data is found everywhere, from health and human infrastructure to the surge of sensors and the proliferation of internet-connected devices. To meet this challenge, the data engineering field has expanded significantly in recent years in both research and industry. Traditionally, data engineering, Machine Learning, and AI workloads have been run on large clusters within data center environments, requiring substantial investment in hardware and maintenance. With the rise of the public cloud, it is now possible to run large applications across nodes without owning or maintaining hardware. Serverless functions such as AWS Lambda provide horizontal scaling and precise billing without the hassle of managing traditional cloud infrastructure. However, when processing large datasets, users often rely on external storage options that are significantly slower than direct communication typical of HPC clusters. We introduce Cylon, a high-performance distributed data frame solution that has shown promising results for data processing using Python. We describe how we took inspiration from the FMI library and designed a serverless communicator to tackle communication and performance issues associated with serverless functions. With our design, we demonstrate that the performance of AWS Lambda falls below one percent of strong scaling experiments compared to serverful AWS (EC2) and HPCs based on implementing direct communication via NAT Traversal TCP Hole Punching.
Problem

Research questions and friction points this paper is trying to address.

Bridging performance gap between serverless and HPC for data processing
Addressing slow communication in serverless functions for large datasets
Improving distributed data frame performance using direct communication techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining serverless and HPC paradigms for ML
Using Cylon distributed data frame solution
Implementing NAT Traversal TCP Hole Punching communication
🔎 Similar Papers
No similar papers found.