Minimizers in Semi-Dynamic Strings

📅 2025-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the efficient maintenance of minimizers—shortest lexicographically minimal substrings of length $k$—over semi-dynamic strings that support insertions and deletions only at both ends. We propose the first dynamic data structure for minimizer indexing under such operations. To overcome the conventional $O(w)$ space barrier (where $w$ is the window size), we design a novel index based on ordered $k$-mer comparison, optimized double-ended queue management, and lazy updates—reducing space complexity to $O(sqrt{w})$, the first strongly sublinear-space solution. Our structure supports amortized $O(1)$-time minimizer queries and updates, and constructs the full minimizer set for a string of length $n$ in $O(n)$ time. Experimental evaluation demonstrates substantial improvements in both time and space efficiency over state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
Minimizers sampling is one of the most widely-used mechanisms for sampling strings. Let $S=S[0]ldots S[n-1]$ be a string over an alphabet $Sigma$. In addition, let $wgeq 2$ and $kgeq 1$ be two integers and $ ho=(Sigma^k,leq)$ be a total order on $Sigma^k$. The minimizer of window $X=S[imathinner{.,.} i+w+k-2]$ is the smallest position in $[i,i+w-1]$ where the smallest length-$k$ substring of $S[imathinner{.,.} i+w+k-2]$ based on $ ho$ starts. The set of minimizers for all $iin[0,n-w-k+1]$ is the set $mathcal{M}_{w,k, ho}(S)$ of the minimizers of $S$. The set $mathcal{M}_{w,k, ho}(S)$ can be computed in $mathcal{O}(n)$ time. The folklore algorithm for this computation computes the minimizer of every window in $mathcal{O}(1)$ amortized time using $mathcal{O}(w)$ working space. It is thus natural to pose the following two questions: Question 1: Can we efficiently support other dynamic updates on the window? Question 2: Can we improve on the $mathcal{O}(w)$ working space? We answer both questions in the affirmative: 1. We term a string $X$ semi-dynamic when one is allowed to insert or delete a letter at any of its ends. We show a data structure that maintains a semi-dynamic string $X$ and supports minimizer queries in $X$ in $mathcal{O}(1)$ time with amortized $mathcal{O}(1)$ time per update operation. 2. We show that this data structure can be modified to occupy strongly sublinear space without increasing the asymptotic complexity of its operations. To the best of our knowledge, this yields the first algorithm for computing $mathcal{M}_{w,k, ho}(S)$ in $mathcal{O}(n)$ time using $mathcal{O}(sqrt{w})$ working space. We complement our theoretical results with a concrete application and an experimental evaluation.
Problem

Research questions and friction points this paper is trying to address.

Efficiently support dynamic updates on string windows.
Improve working space complexity for minimizer computation.
Develop sublinear space data structure for semi-dynamic strings.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimizers sampling in strings.
Data structure for semi-dynamic strings.
Sublinear space complexity achieved.
🔎 Similar Papers
No similar papers found.