🤖 AI Summary
To address the challenge in distributed storage systems of simultaneously achieving low storage overhead, high reliability, and low repair traffic—where replication and erasure coding (EC) individually fall short—this paper proposes HyRES, a network-scale-aware hybrid storage scheme. HyRES innovatively unifies replication and EC within a single coherent framework, rather than merely combining them. It introduces a dynamic tiered encoding strategy and a scale-adaptive repair scheduling mechanism to jointly optimize storage cost, file loss probability (FLP), and cross-network repair traffic. Theoretical modeling and large-scale simulations demonstrate that, under identical fault tolerance guarantees, HyRES reduces storage overhead by approximately 40% compared to pure replication, lowers FLP by over 50% relative to conventional EC, and significantly mitigates the scaling of repair traffic with increasing network size.
📝 Abstract
Reliability in distributed storage systems has typically focused on the design and deployment of data replication or erasure coding techniques. Although some scenarios have considered the use of replication for hot data and erasure coding for cold data in the same system, each is designed in isolation. We propose HyRES, a hybrid scheme incorporates the best characteristics of each scheme, thus, resulting in additional design flexibility and better potential performance for the system. We show that HyRES generalizes previously proposed hybrid schemes. We characterize the theoretical performance of HyRES as well as that of replication and erasure coding considering the effects of the size of the storage networks. We validate our theoretical results using simulations. These results show that HyRES can yield simultaneously lower storage costs than replication, lower probabilities of file loss than replication and erasure coding with similar worst case performance, and even lower effective repair traffic than replication when considering the network size.