π€ AI Summary
DNA-based long-term data storage suffers from strand breakage errors induced by combinatorial synthesis, yet no existing channel model or error-correction framework addresses this structural damage systematically.
Method: This work introduces the first composite DNA channel model explicitly capturing storage-induced strand breaks. We propose a marker-code-based scheme for precise single-break localization and correction, and generalize run-length-limited (RLL) codes to this composite setting. Leveraging rigorous channel modeling, constructive coding design, and information-theoretic analysis, we derive both an upper bound and a constructively achievable lower bound on the channel capacity of marker codes.
Results: The proposed framework significantly enhances the robustness of DNA storage against synthesis-induced strand breaksβthe first systematic error-correction approach tailored to such structural damage. It bridges deep theoretical analysis with practical implementability, offering a foundational solution for reliable biomolecular data storage.
π Abstract
Even tough DNA can be considered as a very stable long term storage medium, errors must be expected during storage. From experiments it is evident that the most common error type due to storage are strand breaks. We address the problem of correcting strand breaks in DNA sequences resulting from composite DNA synthesis. We introduce a novel channel model with realistic assumptions about the errors resulting from long term storage. Our proposed coding scheme employs marker codes to correct single breaks. For this purpose, we generalize run-length-limited codes for the composite setting and derive bounds on the code size.