🤖 AI Summary
This work addresses finite-magnitude probabilistic errors—bounded in amplitude (≤ ℓ) and number (≤ t)—occurring during synthesis and sequencing in DNA storage using probabilistic vector coding (i.e., composite DNA letters). It establishes the first theoretical framework for error correction under this error model. The authors propose a family of block codes that are both asymptotically optimal and systematically constructible: information-theoretically, they derive tight outer and inner bounds on code rate; constructively, they devise multiple explicit encoding schemes and rigorously prove that one class achieves the asymptotically optimal redundancy rate; practically, they implement low-complexity systematic codes whose redundancy approaches the theoretical lower bound, substantially reducing encoding/decoding overhead. This is the first provably correct, finite-magnitude error-correcting scheme for probabilistic DNA storage, offering guaranteed performance under realistic synthesis and sequencing noise models.
📝 Abstract
DNA, with remarkable properties of high density, durability, and replicability, is one of the most appealing storage media. Emerging DNA storage technologies use composite DNA letters, where information is represented by probability vectors, leading to higher information density and lower synthesizing costs than regular DNA letters. However, it faces the problem of inevitable noise and information corruption. This paper explores the channel of composite DNA letters in DNA-based storage systems and introduces block codes for limited-magnitude probability errors on probability vectors. First, outer and inner bounds for limited-magnitude probability error correction codes are provided. Moreover, code constructions are proposed where the number of errors is bounded by t, the error magnitudes are bounded by l, and the probability resolution is fixed as k. These constructions focus on leveraging the properties of limited-magnitude probability errors in DNA-based storage systems, leading to improved performance in terms of complexity and redundancy. In addition, the asymptotic optimality for one of the proposed constructions is established. Finally, systematic codes based on one of the proposed constructions are presented, which enable efficient information extraction for practical implementation.