๐ค AI Summary
This work addresses a key bottleneck in constructing the LCP array from the run-length compressed BurrowsโWheeler transform (RLBWT): the balancing phase of the move structure, which previously required $O(r \log r)$ time and dominated the overall complexity. The paper presents the first linear-time balancing algorithm that operates in $O(r)$ time and space, leveraging the LF, FL, $\phi$, and $\phi^{-1}$ permutations together with run-length encoding to enable efficient LCP array construction directly from the RLBWT. By integrating this novel balancing procedure, the proposed method reduces the total time complexity to $O(n)$ while using only $O(r)$ space, achieving optimal theoretical bounds and effectively overcoming the longstanding time barrier for LCP array construction in compressed space.
๐ Abstract
On repetitive text collections of size $n$, the Burrows-Wheeler Transform (BWT) tends to have relatively fewer runs $r$ in its run-length encoded BWT (RLBWT). This motivates many RLBWT-related algorithms and data structures that can be designed in compressed $O(r)$-space. These approaches often use the RLBWT-derived permutations LF, FL, $ฯ$, and $ฯ^{-1}$, which can be represented using a move structure to obtain optimal $O(1)$-time for each permutation step in $O(r)$-space. They are then used to construct compressed space text indexes supporting efficient pattern matching queries. However, move structure construction in $O(r)$-space requires an $O(r \log r)$-time balancing stage.
The longest common prefix array (LCP) of a text collection is used to support pattern matching queries and data structure construction. Recently, it was shown how to compute the LCP array in $O(n + r \log r)$-time and $O(r)$ additional space from an RLBWT. However, the bottleneck remains the $O(r \log r)$-time move structure balancing stage. In this paper, we describe an optimal $O(r)$-time and space algorithm to balance a move structure. This result is then applied to LCP construction from an RLBWT to obtain an optimal $O(n)$-time algorithm in $O(r)$-space in addition to the output, which implies an optimal-time algorithm for LCP array enumeration in compressed $O(r)$-space.