Faster run-length compressed suffix arrays

📅 2024-08-08

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This paper addresses the inefficiency of prefix expansion queries in run-length compressed suffix arrays (RLCSAs). We propose a theoretically optimal optimization by introducing, for the first time, the Nishimoto–Tabei select-query substitution technique into the RLCSA framework—integrating sparse bitvector rank/select operations, a two-level index structure, and the Burrows–Wheeler transform. This eliminates the inherent $O(log log n)$ factor in prior RLCSA designs. Crucially, our approach preserves the original space complexity of $O(r log(n/r) + r log sigma + sigma)$ bits while strictly improving the character-based prefix expansion query time from $O(log r_a + log log n)$ to $O(log r_a)$, matching the theoretical lower bound. Our solution thus establishes the first compact text index achieving both asymptotically optimal query time and linear-space scalability with respect to the number $r$ of runs in the BWT.

Technology Category

Application Category

📝 Abstract

We review how we can store a run-length compressed suffix array (RLCSA) for a text $T$ of length $n$ over an alphabet of size $sigma$ whose Burrows-Wheeler Transform (BWT) consists of $r$ runs in $O left( ule{0ex}{2ex} r log (n / r) + r log sigma + sigma ight)$ bits such that later, given character $a$ and the suffix array interval for $P$, we can find the suffix-array (SA) interval for $a P$ in $O (log r_a + log log n)$ time, where $r_a$ is the number of runs of copies of $a$ in the BWT. We then show how to modify the RLCSA such that we find the SA interval for $a P$ in only $O (log r_a)$ time, without increasing its asymptotic space bound. Our key idea is applying a result by Nishimoto and Tabei (ICALP 2021) and replacing rank queries on sparse bitvectors by a constant number of select queries. Finally, we review two-level indexing and discuss how our faster RLCSA may be useful in improving it.

Problem

Research questions and friction points this paper is trying to address.

Optimize Run-Length Compressed Suffix Arrays

Reduce time complexity for suffix array intervals

Enhance two-level indexing efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Run-length compressed suffix arrays

Efficient BWT interval search

Optimized rank queries replacement

🔎 Similar Papers

No similar papers found.