Correcting One Deletion and One Substitution with a Constant Number of Reads

📅 2026-04-28

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work investigates the design of error-correcting codes capable of uniquely reconstructing an original sequence from a constant number (e.g., 5, 9, 11, or 14) of noisy reads in the presence of a single deletion and a single substitution error. By analyzing the intersection structure of error balls induced by such errors and leveraging combinatorial coding theory together with logarithmic-scale parity checks, the authors achieve significantly reduced redundancy without requiring the number of reads to scale with the sequence length. Specifically, for five reads, the redundancy is lowered to $3\log n + 4$; for nine and eleven reads, it becomes $2\log n + 12\log\log n + O(1)$ and $\log n + 12\log\log n + O(1)$, respectively; and with fourteen reads, only $\log n + 3$ bits of redundancy suffice—approaching the theoretical lower bound.

📝 Abstract

In this paper, we investigate the problem of designing $(n, N; \mathcal{B})$-reconstruction codes for $N\in \{14,11,9,5\}$, where $\mathcal{B}$ is the single-deletion single-substitution ball function that maps a sequence to the set of all sequences obtainable via one deletion and one substitution. Such a code is defined by the requirement that the intersection size of any two distinct single-deletion single-substitution balls is strictly less than the given number of noisy reads $N$. Note that for any $1\le N<N'$, an $(n, N; \mathcal{B})$-reconstruction code is also an $(n, N'; \mathcal{B})$-reconstruction code. It follows that the problem of designing $(n, N; \mathcal{B})$-reconstruction codes with less redundancy becomes more challenging as $N$ decreases, particularly because the problem for $N=1$ already reduces to the coding problem of single-deletion and single-substitution correcting codes. To the best of our knowledge, most existing results focus on the case where $N$ is a linear function of $n$, while only a limited number consider constant $N$. When $N=1$, the best known $(n, 1; \mathcal{B})$-reconstruction codes (single-deletion and single-substitution correcting codes) require $(4+o(1))\log n$ redundant bits. In this work, we show that this redundancy can be reduced to $3\log n+4$ when $N=5$. As $N$ increases further to $9$ and $11$, the redundancy can be improved to $2\log n+12\log\log n+O(1)$ and $\log n +12\log \log n+O(1)$, respectively. Finally, for $N=14$, we provide a reconstruction code with $\log n+3$ bits of redundancy, which is only two bits more than the best known $(n, 18; \mathcal{B})$-reconstruction codes.

Problem

Research questions and friction points this paper is trying to address.

reconstruction codes

deletion error

substitution error

noisy reads

error-correcting codes

Innovation

Methods, ideas, or system contributions that make the work stand out.

reconstruction codes

deletion-substitution errors

constant reads