Modern Minimal Perfect Hashing: A Survey

๐Ÿ“… 2025-06-06
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the long-standing trade-off among space efficiency, construction time, and query latency in modern minimal perfect hash functions (MPHFs). We propose a unified framework integrating hypergraph modeling, randomized construction, hierarchical hashing, and bit-level compression. For the first time in nearly 28 years, our approach achieves space usage within 0.1% of the information-theoretic lower bound (logโ‚‚(e) โ‰ˆ 1.4427 bits/key). It enables millisecond-scale construction for billion-key sets and nanosecond-scale, single-memory-access queries. We establish a large-scale empirical benchmark spanning up to one billion keys and validate effectiveness across database indexing and bioinformatics applications. All code, experimental infrastructure, and a practical MPHF selection guide are open-sourced. Our work bridges theoretical limits with real-world deployment, providing both rigorous foundations and actionable engineering paradigms for MPHF adoption.

Technology Category

Application Category

๐Ÿ“ Abstract
Given a set $S$ of $n$ keys, a perfect hash function for $S$ maps the keys in $S$ to the first $m geq n$ integers without collisions. It may return an arbitrary result for any key not in $S$ and is called minimal if $m = n$. The most important parameters are its space consumption, construction time, and query time. Years of research now enable modern perfect hash functions to be extremely fast to query, very space-efficient, and scale to billions of keys. Different approaches give different trade-offs between these aspects. For example, the smallest constructions get within 0.1% of the space lower bound of $log_2(e)$ bits per key. Others are particularly fast to query, requiring only one memory access. Perfect hashing has many applications, for example to avoid collision resolution in static hash tables, and is used in databases, bioinformatics, and stringology. Since the last comprehensive survey in 1997, significant progress has been made. This survey covers the latest developments and provides a starting point for getting familiar with the topic. Additionally, our extensive experimental evaluation can serve as a guide to select a perfect hash function for use in applications.
Problem

Research questions and friction points this paper is trying to address.

Survey modern minimal perfect hashing techniques
Compare space, construction, and query time trade-offs
Guide selection of perfect hash functions for applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimal perfect hashing with zero collisions
Space-efficient near theoretical lower bound
Single memory access for fast querying
๐Ÿ”Ž Similar Papers
No similar papers found.