π€ AI Summary
This paper addresses the high node metadata overhead and weak support for parallelism and external memory in long-string indexing. We propose zip-triesβa lightweight, dynamic, and memory-efficient string index structure. Our contributions are threefold: (1) the first node-level metadata compression achieving $O(loglog n + loglog(k/alpha))$ bits per nodeβan exponential reduction; (2) a general parallel string comparison framework enabling $O(log n)$ span for search and update operations under the PRAM model; and (3) an extension to the PEM model, yielding a parallel string B-tree variant with $O(log_B n)$ I/O span and efficient parallel prefix queries. Theoretical analysis spans RAM, word RAM, PRAM, and PEM models, ensuring optimal time complexity $O(k + log n)$ while maintaining strong engineering practicality.
π Abstract
In this paper, we introduce zip-tries, which are simple, dynamic, memory-efficient data structures for strings. Zip-tries support search and update operations for $k$-length strings in $mathcal{O}(k+log n)$ time in the standard RAM model or in $mathcal{O}(k/alpha+log n)$ time in the word RAM model, where $alpha$ is the length of the longest string that can fit in a memory word, and $n$ is the number of strings in the trie. Importantly, we show how zip-tries can achieve this while only requiring $mathcal{O}(log{log{n}} + log{log{frac{k}{alpha}}})$ bits of metadata per node w.h.p., which is an exponential improvement over previous results for long strings. Despite being considerably simpler and more memory efficient, we show how zip-tries perform competitively with state-of-the-art data structures on large datasets of long strings. Furthermore, we provide a simple, general framework for parallelizing string comparison operations in linked data structures, which we apply to zip-tries to obtain parallel zip-tries. Parallel zip-tries are able to achieve good search and update performance in parallel, performing such operations in $mathcal{O}(log{n})$ span. We also apply our techniques to an existing external-memory string data structure, the string B-tree, obtaining a parallel string B-tree which performs search operations using $mathcal{O}(log_B{n})$ I/O span and $mathcal{O}(frac{k}{alpha B} + log_B{n})$ I/O work in the parallel external memory (PEM) model. The parallel string B-tree can perform prefix searches using only $mathcal{O}(frac{log{n}}{log{log{n}}})$ span under the practical PRAM model. For the case of long strings that share short common prefixes, we provide LCP-aware variants of all our algorithms that should be quite efficient in practice, which we justify empirically.