🤖 AI Summary
This paper addresses the coverage depth problem in DNA-based random access storage, aiming to minimize the expected number of reads required to recover a target information strand under linear encoding. We propose two novel coding constructions: an explicit linear-code-based scheme and a geometric coding framework built upon balanced quasi-arcs. We establish rigorous asymptotic performance bounds for both. Crucially, we provide the first systematic analysis of the full probability distribution—not merely the expectation—of coverage depth, revealing distinctions among codes that conventional metrics (e.g., mean coverage) fail to capture; this both validates and corrects key conjectures in the field. Experimental and theoretical results demonstrate that our methods significantly reduce the expected read count while simultaneously improving higher-order statistical properties—including variance and tail decay—thereby enabling highly reliable, low-overhead DNA random access.
📝 Abstract
DNA data storage systems encode digital data into DNA strands, enabling dense and durable storage. Efficient data retrieval depends on coverage depth, a key performance metric. We study the random access coverage depth problem and focus on minimizing the expected number of reads needed to recover information strands encoded via a linear code. We compute the asymptotic performance of a recently proposed code construction, establishing and refining a conjecture in the field by giving two independent proofs. We also analyze a geometric code construction based on balanced quasi-arcs and optimize its parameters. Finally, we investigate the full distribution of the random variables that arise in the coverage depth problem, of which the traditionally studied expectation is just the first moment. This allows us to distinguish between code constructions that, at first glance, may appear to behave identically.