Variations on the Problem of Identifying Spectrum-Preserving String Sets

📅 2026-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a novel necklace cover model based on de Bruijn graphs to efficiently construct compact string sets that preserve the exact k-mer spectrum. Departing from traditional path-cover constraints, the model comprises cycles augmented with tree-like pendant structures. We design a greedy algorithm to construct such covers, achieving superior compression while strictly maintaining k-mer spectral fidelity. Experimental results on real genomic datasets demonstrate that the size of the minimal necklace cover outperforms Eulertigs and attains compression performance comparable to Masked Superstrings, offering an advantageous balance between high-fidelity k-mer preservation and storage efficiency.

Technology Category

Application Category

📝 Abstract
In computational genomics, many analyses rely on efficient storage and traversal of $k$-mers, motivating compact representations such as spectrum-preserving string sets (SPSS), which store strings whose $k$-mer spectrum matches that of the input. Existing approaches, including Unitigs, Eulertigs and Matchtigs, model this task as a path cover problem on the deBruijn graph. We extend this framework from paths to branching structures by introducing necklace covers, which combine cycles and tree-like attachments (pendants). We present a greedy algorithm that constructs a necklace cover while guaranteeing, under certain conditions, optimality in the cumulative size of the final representation. Experiments on real genomic datasets indicate that the minimum necklace cover achieves smaller representations than Eulertigs and comparable compression to the Masked Superstrings approach, while maintaining exactness of the $k$-mer spectrum.
Problem

Research questions and friction points this paper is trying to address.

spectrum-preserving string sets
k-mer spectrum
deBruijn graph
string compression
computational genomics
Innovation

Methods, ideas, or system contributions that make the work stand out.

necklace cover
spectrum-preserving string sets
de Bruijn graph
k-mer compression
branching structures
🔎 Similar Papers
No similar papers found.