Dynamic Grammar-Compressed Self-Index in $delta$-Optimal Space

📅 2026-04-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

224K/year
🤖 AI Summary
Existing dynamic self-indexes struggle to simultaneously achieve δ-optimal space, efficient locate queries, fast updates, and independence from longest common prefix (LCP) information. This work proposes the first dynamic RR-index that fulfills all these requirements by constructing a run-length straight-line context-free grammar (RLSLP) based on restricted recompression techniques. The resulting index supports highly efficient insertion, deletion, and locate operations on highly repetitive strings without relying on LCP data. Experimental evaluation on eleven highly repetitive datasets demonstrates that the proposed index achieves up to 77× faster update times and up to 11× faster locate queries compared to the state-of-the-art dynamic r-index.

Technology Category

Application Category

📝 Abstract
A compressed self-index stores a string in compressed form while supporting locate queries without decompression. For highly repetitive strings (arising in web crawls, versioned documents, and genomic collections), static self-indexes can match the $δ$-optimal lower bound of $Ω(δ\log(n \log σ/ (δ\log n)) \log n)$ bits up to constant factors, where $n$ is the string length, $σ$ is the alphabet size, and $δ$ is the substring complexity. Their dynamic counterparts, however, remain scarce: every existing dynamic self-index either fails to attain $δ$-optimal space, pays at least $Θ(\log n)$ time per reported occurrence during locate, or exposes the longest common prefix (LCP) of the text inside its update time. We present the dynamic RR-index, a dynamic grammar-compressed self-index built on the restricted recompression run-length straight-line program (RLSLP). To our knowledge, it is the first dynamic self-index to attain $δ$-optimal space. The index occupies expected $O(δ\log(n \log σ/ (δ\log n)) \log n)$ bits, answers locate queries in expected $O(m + \log m \log^{2} n + \mathit{occ} (\log n / \log \log n))$ time (where $m$ is the pattern length and $\mathit{occ}$ is the number of occurrences), and supports insertions and deletions of a length-$m'$ substring in expected amortized $O(m' \log^{2} n + \log^{3} n)$ time, with no dependence on the LCP. On eleven highly repetitive corpora, including a $37$ GB Wikipedia dump and a $59$ GB human-chromosome collection, the dynamic RR-index is up to $77\times$ faster than the dynamic r-index on updates and up to $11\times$ faster than other dynamic indexes on locate.
Problem

Research questions and friction points this paper is trying to address.

dynamic self-index
δ-optimal space
highly repetitive strings
locate queries
grammar compression
Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic self-index
δ-optimal space
grammar compression
RLSLP
locate queries