Faster run-length compressed suffix arrays

📅 2024-08-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the inefficiency of prefix expansion queries in run-length compressed suffix arrays (RLCSAs). We propose a theoretically optimal optimization by introducing, for the first time, the Nishimoto–Tabei select-query substitution technique into the RLCSA framework—integrating sparse bitvector rank/select operations, a two-level index structure, and the Burrows–Wheeler transform. This eliminates the inherent $O(log log n)$ factor in prior RLCSA designs. Crucially, our approach preserves the original space complexity of $O(r log(n/r) + r log sigma + sigma)$ bits while strictly improving the character-based prefix expansion query time from $O(log r_a + log log n)$ to $O(log r_a)$, matching the theoretical lower bound. Our solution thus establishes the first compact text index achieving both asymptotically optimal query time and linear-space scalability with respect to the number $r$ of runs in the BWT.

Technology Category

Application Category

📝 Abstract
We review how we can store a run-length compressed suffix array (RLCSA) for a text $T$ of length $n$ over an alphabet of size $sigma$ whose Burrows-Wheeler Transform (BWT) consists of $r$ runs in $O left( ule{0ex}{2ex} r log (n / r) + r log sigma + sigma ight)$ bits such that later, given character $a$ and the suffix array interval for $P$, we can find the suffix-array (SA) interval for $a P$ in $O (log r_a + log log n)$ time, where $r_a$ is the number of runs of copies of $a$ in the BWT. We then show how to modify the RLCSA such that we find the SA interval for $a P$ in only $O (log r_a)$ time, without increasing its asymptotic space bound. Our key idea is applying a result by Nishimoto and Tabei (ICALP 2021) and replacing rank queries on sparse bitvectors by a constant number of select queries. Finally, we review two-level indexing and discuss how our faster RLCSA may be useful in improving it.
Problem

Research questions and friction points this paper is trying to address.

Optimize Run-Length Compressed Suffix Arrays
Reduce time complexity for suffix array intervals
Enhance two-level indexing efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Run-length compressed suffix arrays
Efficient BWT interval search
Optimized rank queries replacement
🔎 Similar Papers
No similar papers found.