🤖 AI Summary
This work characterizes the capacity of short-molecule DNA storage systems, providing the first rigorous proof of the conjecture by Shomorony and Heckel (2022) on the scaling behavior of information bits. Focusing on the short-molecule regime, we propose two novel coding schemes: one achieves the theoretically optimal scaling law, while the other attains near-optimal performance with significantly reduced computational complexity. Methodologically, we employ a random coding framework, quantifying random distributions over the probability simplex and analyzing performance via the optimal maximum-likelihood decoder to derive a tight achievability bound. This bound matches the state-of-the-art converse bound across the entire short-molecule regime—except for extremely short molecules—thereby establishing a tight capacity scaling law. Our results provide a foundational information-theoretic characterization for DNA-based data storage.
📝 Abstract
We study the amount of reliable information that can be stored in a DNA-based storage system composed of short DNA molecules. In this regime, Shomorony and Heckel (2022) put forward a conjecture on the scaling of the number of information bits that can be reliably stored. In this paper, we complete the proof of this conjecture. We analyze a random-coding scheme in which each codeword is obtained by quantizing a randomly generated probability mass function drawn from the probability simplex. By analyzing the optimal maximum-likelihood decoder, we derive an achievability bound that matches a recently established converse bound across the entire short-molecule regime. We also propose a second coding scheme, which operates with significantly lower computational complexity but achieves the optimal scaling, except for a specific range of very short molecules.