🤖 AI Summary
This work addresses the challenge of information loss in DNA data storage caused by insertion and deletion errors at the tails of sequences when combining composite coding with rank modulation. The authors propose a novel coding framework based on partial permutations that encodes information through the relative order of composite DNA symbols rather than relying on exact nucleotide frequencies. Channel errors are modeled under the Kendall tau distance, providing a natural fit for rank-based representations. As the first study to investigate insertion/deletion error correction within the integrated framework of composite DNA coding and rank modulation, this paper establishes theoretical performance bounds under such error models and constructs efficient error-correcting codes. The proposed approach significantly enhances the robustness of DNA storage systems against these prevalent error types.
📝 Abstract
We study the combination of two recent coding approaches, in the context of DNA based data storage. Composite DNA alphabets leverage properties of the DNA synthesis and sequencing process. A composite symbol does not represent a single nucleotide, but rather a designed mixture of DNA nucleotides. Using the high multiplicity that is intrinsic to synthesis and sequencing a composite symbol consists of frequencies in the mixture. Rank modulation codes use permutations to represent information. Combining the two, we construct encoding that uses permutations of nucleotide frequencies rather than the exact frequency values. Codes for this approach were addressed in previous work, under Kendall's tau distances. In this work we study deletion and insertion codes. We present bounds and constructions of efficient codes defined over partial permutations.