Dynamic r-index: An Updatable Self-Index for Highly Repetitive Strings

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses dynamic pattern matching on highly repetitive strings, supporting efficient insertions and deletions. Method: We propose the first dynamic variant of the r-index, built upon the run-length encoded Burrows–Wheeler Transform (RLBWT) and a dynamic LCP array, integrating balanced binary search trees and differential encoding for efficient index maintenance. Contribution/Results: We introduce the first fully dynamic r-index, enabling edit operations in (O((m + L_{max}) log n)) time while preserving (O(r)) space usage and supporting locate queries in (O((m + ext{occ}) log n)) time. Experiments on diverse highly repetitive datasets confirm logarithmic-time updates and queries, with space strictly linear in the number (r) of RLBWT runs—thus achieving a favorable trade-off among compression, dynamic update support, and query efficiency.

Technology Category

Application Category

📝 Abstract
A self-index is a compressed data structure that supports locate queries-reporting all positions where a given pattern occurs in a string. While many self-indexes have been proposed, developing dynamically updatable ones supporting string insertions and deletions remains a challenge. The r-index (Gagie et al., SODA'18) is a representative static self-index based on the run-length Burrows-Wheeler transform (RLBWT), designed for highly repetitive strings - those with many repeated substrings. We present the dynamic r-index, an extension of the r-index that supports locate queries in $mathcal{O}((m + occ) log n)$ time using $mathcal{O}(r)$ words, where $n$ is the length of the string $T$, $m$ is the pattern length, $occ$ is the number of occurrences, and $r$ is the number of runs in the RLBWT of $T$. It supports string insertions and deletions in $mathcal{O}((m + L_{max}) log n)$ time, where $L_{max}$ is the maximum value in the LCP array of $T$. The average running time is $mathcal{O}((m + L_{avg}) log n)$, where $L_{avg}$ is the average LCP value. We experimentally evaluated the dynamic r-index on various highly repetitive strings and demonstrated its practicality.
Problem

Research questions and friction points this paper is trying to address.

Develops a dynamic self-index for repetitive strings
Supports efficient locate queries and string updates
Extends r-index with insertions and deletions capability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic r-index supports updatable self-indexing
Utilizes run-length Burrows-Wheeler transform
Efficiently handles highly repetitive strings
🔎 Similar Papers
No similar papers found.
T
Takaaki Nishimoto
RIKEN Center for Advanced Intelligence Project, Japan
Yasuo Tabei
Yasuo Tabei
Team Leader at RIKEN-AIP
Data MiningMachine learningData structure