🤖 AI Summary
Block trees, as compressed text indexes, offer excellent query performance but suffer from slow construction and high memory overhead—severely limiting practical deployment. To address this, we propose the first lightweight parallel block tree construction framework. Our method introduces a data-parallel Karp–Rabin fingerprinting scheme, synergistically integrating Lempel–Ziv compression structure awareness with multicore CPU optimizations. On a 64-core platform, it achieves near-linear speedup, constructing block trees four times faster than the state-of-the-art sequential algorithm while reducing peak memory consumption by an order of magnitude (10×). Crucially, it preserves the original space efficiency and query performance guarantees. This work marks the first simultaneous breakthrough in construction speed and memory footprint for block trees, establishing a new paradigm for efficient large-scale deployment of compressed text indexes.
📝 Abstract
The block tree [Belazzougui et al., J. Comput. Syst. Sci. '21] is a compressed representation of a length-$n$ text that supports access, rank, and select queries while requiring only $O(zlogfrac{n}{z})$ words of space, where $z$ is the number of Lempel-Ziv factors of the text. In other words, its space-requirements are asymptotically similar to those of the compressed text. In practice, block trees offer comparable query performance to state-of-the-art compressed rank and select indices. However, their construction is significantly slower. Additionally, the fastest construction algorithms require a significant amount of working memory. To address this issue, we propose fast and lightweight parallel algorithms for the efficient construction of block trees. Our algorithm achieves similar speed than the currently fastest construction algorithm on one core and is up to four times faster using 64 cores. It achieves all that while requiring an order of magnitude less memory. As result of independent interest, we present a data parallel algorithm for Karp-Rabin fingerprint computation.