Zip-Tries: Simple Dynamic Data Structures for Strings

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This paper addresses the high node metadata overhead and weak support for parallelism and external memory in long-string indexing. We propose zip-tries—a lightweight, dynamic, and memory-efficient string index structure. Our contributions are threefold: (1) the first node-level metadata compression achieving $O(loglog n + loglog(k/alpha))$ bits per node—an exponential reduction; (2) a general parallel string comparison framework enabling $O(log n)$ span for search and update operations under the PRAM model; and (3) an extension to the PEM model, yielding a parallel string B-tree variant with $O(log_B n)$ I/O span and efficient parallel prefix queries. Theoretical analysis spans RAM, word RAM, PRAM, and PEM models, ensuring optimal time complexity $O(k + log n)$ while maintaining strong engineering practicality.

Technology Category

Application Category

📝 Abstract

In this paper, we introduce zip-tries, which are simple, dynamic, memory-efficient data structures for strings. Zip-tries support search and update operations for $k$-length strings in $mathcal{O}(k+log n)$ time in the standard RAM model or in $mathcal{O}(k/alpha+log n)$ time in the word RAM model, where $alpha$ is the length of the longest string that can fit in a memory word, and $n$ is the number of strings in the trie. Importantly, we show how zip-tries can achieve this while only requiring $mathcal{O}(log{log{n}} + log{log{frac{k}{alpha}}})$ bits of metadata per node w.h.p., which is an exponential improvement over previous results for long strings. Despite being considerably simpler and more memory efficient, we show how zip-tries perform competitively with state-of-the-art data structures on large datasets of long strings. Furthermore, we provide a simple, general framework for parallelizing string comparison operations in linked data structures, which we apply to zip-tries to obtain parallel zip-tries. Parallel zip-tries are able to achieve good search and update performance in parallel, performing such operations in $mathcal{O}(log{n})$ span. We also apply our techniques to an existing external-memory string data structure, the string B-tree, obtaining a parallel string B-tree which performs search operations using $mathcal{O}(log_B{n})$ I/O span and $mathcal{O}(frac{k}{alpha B} + log_B{n})$ I/O work in the parallel external memory (PEM) model. The parallel string B-tree can perform prefix searches using only $mathcal{O}(frac{log{n}}{log{log{n}}})$ span under the practical PRAM model. For the case of long strings that share short common prefixes, we provide LCP-aware variants of all our algorithms that should be quite efficient in practice, which we justify empirically.

Problem

Research questions and friction points this paper is trying to address.

Develop dynamic, memory-efficient string data structures

Achieve fast search and update operations for strings

Enable parallel processing for string comparison operations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zip-tries enable efficient string search and update operations

Parallel zip-tries enhance performance with parallel string comparisons

LCP-aware variants optimize algorithms for shared short prefixes

🔎 Similar Papers

No similar papers found.