🤖 AI Summary
Nanopore sequencing suffers from high error rates, and existing error-correction methods are limited in the types of codes they support and incur high decoding complexity. This work proposes two algorithms: PrimerSeeker and SynDe. PrimerSeeker efficiently identifies primer sequences within reads, enabling real-time primer detection. SynDe introduces, for the first time, a general low-complexity decoding framework for arbitrary linear error-correcting codes that admit low-complexity graphical representations, directly applicable to raw reads. It further incorporates a confidence-scoring mechanism to filter reliable outputs. By integrating primer detection, syndrome-guided decoding, and periodic marker-enhanced convolutional codes, SynDe achieves error-correction performance comparable to or better than state-of-the-art methods while substantially reducing computational time complexity.
📝 Abstract
Nanopore sequencing technology remains highly error-prone, making efficient error correction essential in DNA-based data storage. Prior work addressed high error rates using convolutional codes with their decoder coupled with the basecaller, but such approaches only accommodate a limited number of code classes and incur significant decoding complexity. To overcome these limitations, we propose two algorithms: PrimerSeeker, which efficiently detects primer sequences in raw nanopore sequencing reads, and SynDe, a decoder that operates on the same raw reads and supports any linear error correction code with a low-complexity graphical representation. PrimerSeeker provides primer location estimates close to those of existing approaches while being better suited for real-time primer detection during sequencing. SynDe performs well with convolutional codes augmented with periodic markers, often approaching or exceeding the performance of existing algorithms with a lower time complexity. Remarkably, the confidence scores produced by SynDe reliably identify which of its outputs should be discarded.