iFast: Host-Side Logging for Scientific Applications

📅 2024-01-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Scientific applications face I/O throughput bottlenecks in heterogeneous cloud storage environments—including burst buffers, cloud-based parallel file systems, and object stores—where conventional optimization techniques fail to simultaneously achieve high throughput, crash consistency, and application transparency. This paper proposes a lightweight, distributed host-side logging mechanism that enables transparent, high-performance writes from unmodified MPI applications to object stores (e.g., Amazon S3). Our approach integrates MPI-IO interception and redirection, distributed buffer management, and a cross-platform storage adaptation layer. To the best of our knowledge, it is the first solution supporting standard MPI-IO direct writes to production-grade S3. Experimental evaluation across multiple cloud HPC scenarios demonstrates 13–26% end-to-end execution time reduction for three representative scientific applications. Data is persistently stored in shareable formats, delivering superior performance, strict crash consistency, and simplified deployment.

Technology Category

Application Category

📝 Abstract
We have seen an increase in the heterogeneity of storage technologies potentially available to scientific applications, such as burst buffers, managed cloud parallel file systems (PFS), and object stores. However, those applications cannot easily utilize those technologies, because they are designed for traditional HPC systems that offer very high remote storage and network bandwidth. We present iFast, a new distributed host-side logging approach to transparently accelerating scientific applications. iFast has a strong emphasis on deployability, supporting unmodified MPI applications with unmodified MPI implementations while preserving the crash consistency semantics. We evaluate iFast on traditional HPC, cloud HPC, local cluster, and a hybrid of both, using three scientific applications. iFast reduces end-to-end execution time by 13-26% for popular scientific applications on the cloud. We show for the first time, how an application on a recent production HPC system can write data to S3 storage through fully fledged MPI-IO, in a readily shareable format.
Problem

Research questions and friction points this paper is trying to address.

Optimizing output-intensive applications for low storage throughput
Satisfying synchronization requirements in modern computing environments
Accelerating scientific applications while preserving crash consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Host-side logging for parallel checkpoints
Transparent acceleration of unmodified MPI applications
Distributed approach preserving crash consistency semantics
🔎 Similar Papers
No similar papers found.