🤖 AI Summary
This study addresses the challenge of reconstructing noisy, unordered DNA fragments degraded by sequence decay in DNA-based data storage by modeling the process as a torn-paper channel with substitution errors. The work proposes a novel approach that embeds either static or data-dependent hash markers into encoded sequences—a strategy systematically applied to this channel for the first time—to enable high-fidelity reconstruction. By integrating channel coding, marker design, hash functions, and reconstruction algorithms, and supported by probabilistic modeling and simulations, the study reveals a complementary performance trade-off between the two marker types under varying noise levels. Experimental results demonstrate reconstruction success rates exceeding 99% across multiple noise conditions, with zero decoding errors observed; performance is primarily constrained by computational resources rather than algorithmic limitations.
📝 Abstract
To make DNA a suitable medium for archival data storage, it is essential to consider the decay process of the strands observed in DNA storage systems. This paper studies the decay process as a probabilistic noisy torn paper channel (TPC), which first corrupts the bits of the transmitted sequence in a probabilistic manner by substitutions, then breaks the sequence into a set of noisy unordered substrings. The present work devises coding schemes for the noisy TPC by embedding markers in the transmitted sequence. We investigate the use of static markers and markers connected to the data in the form of hash functions. These two tools have also been recently exploited to tackle the noiseless TPC. Simulations show that static markers excel at higher substitution probabilities, while data-dependent markers are superior at lower noise levels. Both approaches achieve reconstruction rates exceeding $99\%$ with no false decodings observed, primarily limited by computational resources.