๐ค AI Summary
This study addresses the challenge of data recovery in DNA-based distributed storage systems caused by container failures. It introduces, for the first time, erasure coding as a fault-tolerance mechanism tailored to the DNA storage context and proposes a recovery time analysis framework grounded in the generalized coupon collector problem. By modeling the stochastic nature of sequencing reads as a random sampling process, the work leverages probability theory and stochastic sampling to systematically quantify the expected recovery times for various erasure codes under realistic operational conditions. The resulting analytical framework provides a rigorous theoretical foundation and performance evaluation methodology for designing highly reliable DNA-based distributed storage systems.
๐ Abstract
We initiate the study of DNA-based distributed storage systems, where information is encoded across multiple DNA data storage containers to achieve robustness against container failures. In this setting, data are distributed over $M$ containers, and the objective is to guarantee that the contents of any failed container can be reliably reconstructed from the surviving ones. Unlike classical distributed storage systems, DNA data storage containers are fundamentally constrained by sequencing technology, since each read operation yields the content of a uniformly random sampled strand from the container. Within this framework, we consider several erasure-correcting codes and analyze the expected recovery time of the data stored in a failed container. Our results are obtained by analyzing generalized versions of the classical Coupon Collector's Problem, which may be of independent interest.