AdapTBF: Decentralized Bandwidth Control via Adaptive Token Borrowing for HPC Storage

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation of large-scale HPC applications in shared storage systems caused by imbalanced I/O bandwidth allocation, where small jobs often monopolize resources at the expense of larger ones. To resolve this, the authors propose AdapTBF, a decentralized, adaptive bandwidth regulation scheme based on a token bucket mechanism, designed for integration into parallel file systems such as Lustre. Unlike conventional static rate-limiting approaches, AdapTBF dynamically enables idle bandwidth borrowing and lending to accommodate bursty I/O demands while preserving fairness and storage efficiency. Experimental results under real-world workloads demonstrate that AdapTBF significantly improves aggregate throughput, effectively safeguards the performance of large jobs, and maintains high resource utilization alongside equitable bandwidth distribution even in extreme scenarios.

Technology Category

Application Category

📝 Abstract
Modern high-performance computing (HPC) applications run on compute resources but share global storage systems. This design can cause problems when applications consume a disproportionate amount of storage bandwidth relative to their allocated compute resources. For example, an application running on a single compute node can issue many small, random writes and consume excessive I/O bandwidth from a storage server. This can hinder larger jobs that write to the same storage server and are allocated many compute nodes, resulting in significant resource waste. A straightforward solution is to limit each application's I/O bandwidth on storage servers in proportion to its allocated compute resources. This approach has been implemented in parallel file systems using Token Bucket Filter (TBF). However, strict proportional limits often reduce overall I/O efficiency because HPC applications generate short, bursty I/O. Limiting bandwidth can waste server capacity when applications are idle or prevent applications from temporarily using higher bandwidth during bursty phases. We argue that I/O control should maximize per-application performance and overall storage efficiency while ensuring fairness (e.g., preventing small jobs from blocking large-scale ones). We propose AdapTBF, which builds on TBF in modern parallel file systems (e.g., Lustre) and introduces a decentralized bandwidth control approach using adaptive borrowing and lending. We detail the algorithm, implement AdapTBF in Lustre, and evaluate it using synthetic workloads modeled after real-world scenarios. Results show that AdapTBF manages I/O bandwidth effectively while maintaining high storage utilization, even under extreme conditions.
Problem

Research questions and friction points this paper is trying to address.

HPC storage
bandwidth control
I/O fairness
resource allocation
bursty I/O
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Token Borrowing
Decentralized Bandwidth Control
HPC Storage
Token Bucket Filter
I/O Fairness
🔎 Similar Papers
No similar papers found.
M
Md Hasanur Rashid
Department of Computer and Information Sciences, University of Delaware, Newark, US.
Dong Dai
Dong Dai
Associate Professor, University of Delaware
AI4HPCHPC StorageHPC I/O