🤖 AI Summary
This paper addresses the random access problem in DNA storage: given $k$ target messages encoded across $n$ DNA strands, how to retrieve any single message with minimal expected read count. We first establish a tight theoretical lower bound of $0.914 imes 2$ for $k = 2$, and present an optimal explicit construction achieving it. Building on $B_{k-1}$ sequences, we propose a general, explicit coding framework scalable to arbitrary $k$. For $k = 4$, our scheme achieves the lowest known expected read count, outperforming all prior approaches. Technically, the work integrates combinatorial code design, $B_h$-sequence theory, finite-field constructions, and probabilistic expectation optimization—balancing theoretical optimality with practical constructibility.
📝 Abstract
We study the Random Access Problem in DNA storage, which addresses the challenge of retrieving a specific information strand from a DNA-based storage system. Given that $k$ information strands, representing the data, are encoded into $n$ strands using a code. The goal under this paradigm is to identify and analyze codes that minimize the expected number of reads required to retrieve any of the $k$ information strand, while in each read one of the $n$ encoded strands is read uniformly at random. We fully solve the case when $k=2$, showing that the best possible code attains a random access expectation of $0.914 cdot 2$. Moreover, we generalize a construction from cite{GMZ24}, specific to $k=3$, for any value of $k$. Our construction uses $B_{k-1}$ sequences over $mathbb{Z}_{q-1}$, that always exist over large finite fields. For $k=4$, we show that this generalized construction outperforms all previous constructions in terms of reducing the random access expectation .