🤖 AI Summary
This work addresses the vulnerability of DNA-based data storage to noise during synthesis, preservation, and sequencing, where conventional error-correcting codes fail when errors exceed predefined thresholds. To overcome this limitation, the authors propose a Partitioned Mapping with Jump-and-Rotate (PJ) encoding scheme that eliminates inter-strand dependencies, thereby transforming strand loss into localized information gaps amenable to AI-driven inference for controlled recovery. This approach establishes the first universal DNA storage framework that does not require prior knowledge of error probabilities and enables successful file decoding under arbitrary strand loss rates, with information fidelity degrading gracefully as damage increases. Experimental results demonstrate robust data recovery under extreme conditions—including 10% strand loss, accelerated aging, and high-intensity X-ray irradiation—while preserving the classification performance of machine learning datasets, significantly enhancing storage robustness and fault tolerance.
📝 Abstract
Encoding digital information into DNA sequences offers an attractive potential solution for storing rapidly growing data under the information age and the rise of artificial intelligence. However, practical implementations of DNA storage are constrained by errors introduced during synthesis, preservation, and sequencing processes, and traditional error-correcting codes remain vulnerable to noise levels that exceed predefined thresholds. Here, we developed a Partitioning-mapping with Jump-rotating (PJ) encoding scheme, which exhibits exceptional noise resilience. PJ removes cross-strand information dependencies so that strand loss manifests as localized gaps rather than catastrophic file failure. It prioritizes file decodability under arbitrary noise conditions and leverages AI-based inference to enable controllable recovery of digital information. For the intra-strand encoding, we develop a jump-rotating strategy that relaxes sequence constraints relative to conventional rotating codes and provides tunable information density via an adjustable jump length. Based on this encoding architecture, the original file information can always be decoded and recovered under any strand loss ratio, with fidelity degrading smoothly as damage increases. We demonstrate that original files can be effectively recovered even with 10% strand loss, and machine learning datasets stored under these conditions retain their classification performance. Experiments further confirmed that PJ successfully decodes image files after extreme environmental disturbance using accelerated aging and high-intensity X-ray irradiation. By eliminating reliance on prior error probabilities, PJ establishes a general framework for robust, archival DNA storage capable of withstanding the rigorous conditions of real-world preservation.