D-Rex: Heterogeneity-Aware Reliability Framework and Adaptive Algorithms for Distributed Storage

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

246K/year

🤖 AI Summary

To address inefficient erasure coding deployment in heterogeneous distributed storage—caused by significant disparities in node capacity, I/O performance, and failure rates—this paper proposes D-Rex, a dynamic adaptive scheduling framework. D-Rex introduces three key contributions: (1) a dynamic erasure coding parameter selection and data block mapping mechanism tailored to heterogeneous resources; (2) a dual-mode scheduler supporting both load balancing (LB) and strict reliability guarantee (SC); and (3) two greedy algorithms—GreedyMinStorage and GreedyLeastUsed—that jointly optimize storage overhead, encoding/decoding cost, and user-specified reliability targets. Experimental results demonstrate that D-Rex increases average stored data volume by 45% over state-of-the-art approaches; GreedyLeastUsed further improves storage volume by 21% while enhancing throughput. All components significantly outperform existing methods in efficiency, adaptability, and reliability-aware resource utilization.

Technology Category

Application Category

📝 Abstract

The exponential growth of data necessitates distributed storage models, such as peer-to-peer systems and data federations. While distributed storage can reduce costs and increase reliability, the heterogeneity in storage capacity, I/O performance, and failure rates of storage resources makes their efficient use a challenge. Further, node failures are common and can lead to data unavailability and even data loss. Erasure coding is a common resiliency strategy implemented in storage systems to mitigate failures by striping data across storage locations. However, erasure coding is computationally expensive and existing systems do not consider the heterogeneous resources and their varied capacity and performance when placing data chunks. We tackle the challenges of using erasure coding with distributed and heterogeneous nodes, aiming to store as much data as possible, minimize encoding and decoding time, and meeting user-defined reliability requirements for each data item. We propose two new dynamic scheduling algorithms, D-Rex LB and D-Rex SC, that adaptively choose erasure coding parameters and map chunks to heterogeneous nodes. D-Rex SC achieves robust performance for both storage utilization and throughput, at a higher computational cost, while D-Rex LB is faster but with slightly less competitive performance. In addition, we propose two greedy algorithms, GreedyMinStorage and GreedyLeastUsed, that optimize for storage utilization and load balancing, respectively. Our experimental evaluation shows that our dynamic schedulers store, on average, 45% more data items without significantly degrading I/O throughput compared to state-of-the-art algorithms, while GreedyLeastUsed is able to store 21% more data items while also increasing throughput.

Problem

Research questions and friction points this paper is trying to address.

Optimizing erasure coding for heterogeneous distributed storage nodes

Minimizing encoding/decoding time while meeting reliability requirements

Balancing storage utilization and throughput in dynamic scheduling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneity-aware adaptive erasure coding algorithms

Dynamic scheduling for optimal storage and throughput

Greedy algorithms for load balancing optimization

🔎 Similar Papers

No similar papers found.