🤖 AI Summary
This paper addresses the inefficiency of prefix expansion queries in run-length compressed suffix arrays (RLCSAs). We propose a theoretically optimal optimization by introducing, for the first time, the Nishimoto–Tabei select-query substitution technique into the RLCSA framework—integrating sparse bitvector rank/select operations, a two-level index structure, and the Burrows–Wheeler transform. This eliminates the inherent $O(log log n)$ factor in prior RLCSA designs. Crucially, our approach preserves the original space complexity of $O(r log(n/r) + r log sigma + sigma)$ bits while strictly improving the character-based prefix expansion query time from $O(log r_a + log log n)$ to $O(log r_a)$, matching the theoretical lower bound. Our solution thus establishes the first compact text index achieving both asymptotically optimal query time and linear-space scalability with respect to the number $r$ of runs in the BWT.
📝 Abstract
We review how we can store a run-length compressed suffix array (RLCSA) for a text $T$ of length $n$ over an alphabet of size $sigma$ whose Burrows-Wheeler Transform (BWT) consists of $r$ runs in $O left(
ule{0ex}{2ex} r log (n / r) + r log sigma + sigma
ight)$ bits such that later, given character $a$ and the suffix array interval for $P$, we can find the suffix-array (SA) interval for $a P$ in $O (log r_a + log log n)$ time, where $r_a$ is the number of runs of copies of $a$ in the BWT. We then show how to modify the RLCSA such that we find the SA interval for $a P$ in only $O (log r_a)$ time, without increasing its asymptotic space bound. Our key idea is applying a result by Nishimoto and Tabei (ICALP 2021) and replacing rank queries on sparse bitvectors by a constant number of select queries. Finally, we review two-level indexing and discuss how our faster RLCSA may be useful in improving it.