🤖 AI Summary
This work addresses the lack of efficient construction methods for minimal k-perfect hash functions (MkPHFs). We propose four novel algorithms—graph-coloring-based, greedy assignment, hierarchical hashing, and compact bit encoding—integrated with external-memory-friendly design and empirically driven optimizations. Our approach is the first to simultaneously outperform prior methods in space overhead, construction time, and query latency: achieving as low as 1.5 bits per key, millisecond-scale table construction, and nanosecond-scale queries, while mapping n keys into ⌈n/k⌉ buckets with at most k keys per bucket. The work systematically revitalizes k-perfect hashing research, with an open-source implementation validated on TB-scale datasets. It provides critical support for external-memory indexing and high-performance single-probe hash function generation.
📝 Abstract
Given a set S of n keys, a k-perfect hash function (kPHF) is a data structure that maps the keys to the first m integers, where each output integer can be hit by at most k input keys. When m=n/k, the resulting function is called a minimal k-perfect hash function (MkPHF). Applications of kPHFs can be found in external memory data structures or to create efficient 1-perfect hash functions, which in turn have a wide range of applications from databases to bioinformatics. Several papers from the 1980s look at external memory data structures with small internal memory indexes. However, actual k-perfect hash functions are surprisingly rare, and the area has not seen a lot of research recently. At the same time, recent research in 1-perfect hashing shows that there is a lack of efficient kPHFs. In this paper, we revive the area of k-perfect hashing, presenting four new constructions. Our implementations simultaneously dominate older approaches in space consumption, construction time, and query time. We see this paper as a possible starting point of an active line of research, similar to the area of 1-perfect hashing.