🤖 AI Summary
Insertion, deletion, and substitution (IDS) errors in DNA-based storage cause severe synchronization loss; existing marker codes incur high synchronization overhead and suffer from low channel mutual information.
Method: We propose semi-marker codes—leveraging the inherent two-bit representation of DNA nucleotides, we dedicate one bit to synchronization marking and the other to data payload, enabling joint synchronization and data encoding.
Contribution/Results: This design breaks the conventional all-marker paradigm for the first time, halving synchronization overhead and expanding the code design space both theoretically and constructively. Through information-theoretic analysis, soft-output IDS channel modeling, and outer-code concatenation, simulations demonstrate that, under concatenated outer coding, our scheme significantly reduces end-to-end bit error rate compared to standard marker codes and empirically improves mutual information—establishing a novel, highly robust coding paradigm for DNA storage.
📝 Abstract
DNA storage systems face significant challenges, including insertion, deletion, and substitution (IDS) errors. Therefore, designing effective synchronization codes, i.e., codes capable of correcting IDS errors, is essential for DNA storage systems. Marker codes are a favorable choice for this purpose. In this paper, we extend the notion of marker codes by making the following key observation. Since each DNA base is equivalent to a 2-bit storage unit, one bit can be reserved for synchronization, while the other is dedicated to data transmission. Using this observation, we propose a new class of marker codes, which we refer to as half-marker codes. We demonstrate that this extension has the potential to significantly increase the mutual information between the input symbols and the soft outputs of an IDS channel modeling a DNA storage system. Specifically, through examples, we show that when concatenated with an outer error-correcting code, half-marker codes outperform standard marker codes and significantly reduce the end-to-end bit error rate of the system.