DIAL: Decentralized I/O AutoTuning via Learned Client-side Local Metrics for Parallel File System

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approaches to automatic I/O tuning in parallel file systems rely on global runtime metrics and precise I/O pattern modeling, incurring high overhead and hindering fine-grained dynamic optimization. To address this limitation, this work proposes DIAL, a novel decentralized auto-tuning mechanism wherein each client independently makes tuning decisions based solely on locally observable metrics using a lightweight machine learning model, while collectively adapting to global storage state changes. By eliminating the need for global modeling, DIAL substantially reduces monitoring overhead and enables real-time, dynamic configuration tuning. This approach effectively enhances overall application I/O performance without requiring access to global system information.

Technology Category

Application Category

📝 Abstract
Enabling efficient, high-performance data access in parallel file systems (PFS) is critical for today's high-performance computing systems. PFS client-side I/O heavily impacts the final I/O performance delivered to individual applications and the entire system. Autotuning the key client-side I/O behaviors has been extensively studied and shows promising results. However, existing work has heavily relied on extensive number of global runtime metrics to monitor and accurate modeling of applications' I/O patterns. Such heavy overheads significantly limit the ability to enable fine-grained, dynamic tuning in practical systems. In this study, we propose DIAL (Decentralized I/O AutoTuning via Learned Client-side Local Metrics) which takes a drastically different approach. Instead of trying to extract the global I/O patterns of applications, DIAL takes a decentralized approach, treating each I/O client as an independent unit and tuning configurations using only its locally observable metrics. With the help of machine learning models, DIAL enables multiple tunable units to make independent but collective decisions, reacting to what is happening in the global storage systems in a timely manner and achieving better I/O performance globally for the application.
Problem

Research questions and friction points this paper is trying to address.

parallel file system
I/O autotuning
client-side optimization
global metrics overhead
dynamic tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized Autotuning
Client-side Local Metrics
Parallel File System
Machine Learning
I/O Optimization
🔎 Similar Papers
No similar papers found.
M
Md Hasanur Rashid
Department of Computer and Information Sciences, University of Delaware, Newark, US
X
Xinyi Li
Department of Electrical and Computer Engineering, Iowa State University, Ames, US
Youbiao He
Youbiao He
Iowa State University
artificial intelligence
F
Forrest Sheng Bao
Department of Electrical and Computer Engineering, Iowa State University, Ames, US
Dong Dai
Dong Dai
Associate Professor, University of Delaware
AI4HPCHPC StorageHPC I/O