🤖 AI Summary
DNA parallel synthesis faces high costs due to stringent requirements of a fixed supersequence and simultaneous constraints—namely, an *l*-runlength limit (no homopolymer runs exceeding length *l*) and *ε*-balance (GC-content deviation ≤ *ε*). Existing coding schemes suffer from fundamental trade-offs between code rate and error-correction capability.
Method: We propose the first capacity-optimal coding scheme that jointly satisfies both constraints strictly while supporting single-insertion/deletion correction. Our approach integrates combinatorial construction, finite-state machine (FSM) encoding, and synchronized decoding, achieving *O*(*n*)-time encoding and decoding.
Contribution/Results: This is the first scheme to achieve the theoretical channel capacity under these dual constraints. It guarantees 100% constraint satisfaction and 100% single-edit correction success rate—surpassing prior art in both rate and robustness. Experimental results demonstrate substantial improvements in synthesis efficiency and data reliability for DNA-based storage.
📝 Abstract
DNA synthesis is considered as one of the most expensive components in current DNA storage systems. In this paper, focusing on a common synthesis machine, which generates multiple DNA strands in parallel following a fixed supersequence,we propose constrained codes with polynomial-time encoding and decoding algorithms. Compared to the existing works, our codes simultaneously satisfy both l-runlength limited and {epsilon}-balanced constraints. By enumerating all valid sequences, our codes achieve the maximum rate, matching the capacity. Additionally, we design constrained error-correcting codes capable of correcting one insertion or deletion in the obtained DNA sequence while still adhering to the constraints.