🤖 AI Summary
With AI and high-resolution simulations increasingly driving HPC workloads, parallel I/O performance bottlenecks have grown more complex, while existing optimization tools remain fragmented and difficult to select. Method: We systematically review 131 publications and—employing bibliometric analysis, systematic literature review, and taxonomy modeling—construct the first comprehensive, end-to-end parallel I/O classification framework (a “360° taxonomy”) covering characterization, analysis, and optimization. Our approach integrates cross-platform profiling and tracing tools—including Darshan, Vampir, and Lustre trace—into a unified analytical pipeline. Contribution: We propose the first holistic, cross-layer I/O optimization framework spanning applications, runtime systems, file systems, and hardware; release a structured knowledge graph and open-source classification toolkit; and significantly reduce decision-making overhead in selecting optimization strategies. This work delivers a reusable, scalable methodology for enhancing parallel I/O performance in production HPC environments.
📝 Abstract
Driven by artificial intelligence, data science, and high-resolution simulations, I/O workloads and hardware on high-performance computing (HPC) systems have become increasingly complex. This complexity can lead to large I/O overheads and overall performance degradation. These inefficiencies are often mitigated using tools and techniques for characterizing, analyzing, and optimizing the I/O behavior of HPC applications. That said, the myriad number of tools and techniques available makes it challenging to navigate to the best approach. In response, this paper surveys 131 papers from the ACM Digital Library, IEEE Xplore, and other reputable journals to provide a comprehensive analysis, synthesized in the form of a taxonomy, of the current landscape of parallel I/O characterization, analysis, and optimization of large-scale HPC systems. We anticipate that this taxonomy will serve as a valuable resource for enhancing I/O performance of HPC applications.