SCONE: A Practical, Constraint-Aware Plug-in for Latent Encoding in Learned DNA Storage

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing neural compression methods for DNA storage typically employ a naive binary-to-quaternary mapping when transcoding latent representations into DNA, neglecting both entropy optimization and biochemical constraints, which leads to suboptimal efficiency. This work proposes SCONE, a plug-in module that enables, for the first time, constraint-aware end-to-end DNA encoding directly in the latent space. By performing quaternary arithmetic coding natively in the DNA base space, SCONE dynamically adapts probability distributions to deterministically satisfy key biochemical constraints—such as GC balance and homopolymer suppression—without requiring post-processing. The approach is fully invertible, compatible with existing hyperprior-based models, and incurs negligible computational overhead (latency increase <2%), offering a universal and efficient codec interface for learned DNA storage systems.

Technology Category

Application Category

📝 Abstract
DNA storage has matured from concept to practical stage, yet its integration with neural compression pipelines remains inefficient. Early DNA encoders applied redundancy-heavy constraint layers atop raw binary data - workable but primitive. Recent neural codecs compress data into learned latent representations with rich statistical structure, yet still convert these latents to DNA via naive binary-to-quaternary transcoding, discarding the entropy model's optimization. This mismatch undermines compression efficiency and complicates the encoding stack. A plug-in module that collapses latent compression and DNA encoding into a single step. SCONE performs quaternary arithmetic coding directly on the latent space in DNA bases. Its Constraint-Aware Adaptive Coding module dynamically steers the entropy encoder's learned probability distribution to enforce biochemical constraints - Guanine-Cytosine (GC) balance and homopolymer suppression - deterministically during encoding, eliminating post-hoc correction. The design preserves full reversibility and exploits the hyperprior model's learned priors without modification. Experiments show SCONE achieves near-perfect constraint satisfaction with negligible computational overhead (<2% latency), establishing a latent-agnostic interface for end-to-end DNA-compatible learned codecs.
Problem

Research questions and friction points this paper is trying to address.

DNA storage
latent encoding
biochemical constraints
neural compression
entropy coding
Innovation

Methods, ideas, or system contributions that make the work stand out.

DNA storage
latent encoding
constraint-aware coding
arithmetic coding
neural compression
🔎 Similar Papers
No similar papers found.
C
Cihan Ruan
Department of Computer Science and Engineering, Santa Clara University, Santa Clara, CA, USA
L
Lebin Zhou
Department of Computer Science and Engineering, Santa Clara University, Santa Clara, CA, USA
Rongduo Han
Rongduo Han
Nankai University
L
Linyi Han
Department of Computer Science and Engineering, Santa Clara University, Santa Clara, CA, USA
B
Bingqing Zhao
Department of Genetics, Stanford School of Medicine, Stanford University, Palo Alto, CA, USA
Chenchen Zhu
Chenchen Zhu
Research Scientist, Meta Reality Labs
Computer VisionDeep LearningPerception
W
Wei Jiang
Futurewei Technologies Inc., Santa Clara, CA, USA
W
Wei Wang
Futurewei Technologies Inc., Santa Clara, CA, USA
N
Nam Ling
Department of Computer Science and Engineering, Santa Clara University, Santa Clara, CA, USA