🤖 AI Summary
In DNA data storage, insertion-deletion-substitution (IDS) synchronization errors introduced during synthesis and sequencing render conventional maximum-likelihood decoding computationally prohibitive and heavily dependent on accurate channel modeling. To address this, we propose the Neural Polar Decoder (NPD), the first architecture that integrates polar code structure with differentiable neural networks for model-agnostic, sample-driven, low-complexity decoding. NPD is built upon a differentiable polar graph, eliminating the need for explicit channel modeling while supporting general IDS channels and real nanopore/multi-read sequencing data. It further outputs mutual information estimates to jointly optimize coding design and input distribution. Under deletion channels, NPD approaches channel capacity with only *O*(*AN* log *N*) complexity. In realistic DNA storage benchmarks, NPD achieves performance on par with or superior to state-of-the-art methods while using significantly fewer parameters.
📝 Abstract
Synchronization errors, such as insertions and deletions, present a fundamental challenge in DNA-based data storage systems, arising from both synthesis and sequencing noise. These channels are often modeled as insertion-deletion-substitution (IDS) channels, for which designing maximum-likelihood decoders is computationally expensive. In this work, we propose a data-driven approach based on neural polar decoders (NPDs) to design low-complexity decoders for channels with synchronization errors. The proposed architecture enables decoding over IDS channels with reduced complexity $O(AN log N )$, where $A$ is a tunable parameter independent of the channel. NPDs require only sample access to the channel and can be trained without an explicit channel model. Additionally, NPDs provide mutual information (MI) estimates that can be used to optimize input distributions and code design. We demonstrate the effectiveness of NPDs on both synthetic deletion and IDS channels. For deletion channels, we show that NPDs achieve near-optimal decoding performance and accurate MI estimation, with significantly lower complexity than trellis-based decoders. We also provide numerical estimates of the channel capacity for the deletion channel. We extend our evaluation to realistic DNA storage settings, including channels with multiple noisy reads and real-world Nanopore sequencing data. Our results show that NPDs match or surpass the performance of existing methods while using significantly fewer parameters than the state-of-the-art. These findings highlight the promise of NPDs for robust and efficient decoding in DNA data storage systems.