Engineering Minimal k-Perfect Hash Functions

📅 2025-04-28

📈 Citations: 0

✨ Influential: 0

career value

259K/year

🤖 AI Summary

This work addresses the lack of efficient construction methods for minimal k-perfect hash functions (MkPHFs). We propose four novel algorithms—graph-coloring-based, greedy assignment, hierarchical hashing, and compact bit encoding—integrated with external-memory-friendly design and empirically driven optimizations. Our approach is the first to simultaneously outperform prior methods in space overhead, construction time, and query latency: achieving as low as 1.5 bits per key, millisecond-scale table construction, and nanosecond-scale queries, while mapping n keys into ⌈n/k⌉ buckets with at most k keys per bucket. The work systematically revitalizes k-perfect hashing research, with an open-source implementation validated on TB-scale datasets. It provides critical support for external-memory indexing and high-performance single-probe hash function generation.

Technology Category

Application Category

📝 Abstract

Given a set S of n keys, a k-perfect hash function (kPHF) is a data structure that maps the keys to the first m integers, where each output integer can be hit by at most k input keys. When m=n/k, the resulting function is called a minimal k-perfect hash function (MkPHF). Applications of kPHFs can be found in external memory data structures or to create efficient 1-perfect hash functions, which in turn have a wide range of applications from databases to bioinformatics. Several papers from the 1980s look at external memory data structures with small internal memory indexes. However, actual k-perfect hash functions are surprisingly rare, and the area has not seen a lot of research recently. At the same time, recent research in 1-perfect hashing shows that there is a lack of efficient kPHFs. In this paper, we revive the area of k-perfect hashing, presenting four new constructions. Our implementations simultaneously dominate older approaches in space consumption, construction time, and query time. We see this paper as a possible starting point of an active line of research, similar to the area of 1-perfect hashing.

Problem

Research questions and friction points this paper is trying to address.

Design minimal k-perfect hash functions for efficient key mapping

Address lack of efficient k-perfect hash functions in current research

Improve space, construction time, and query time of kPHFs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Engineering minimal k-perfect hash functions

Four new constructions for kPHFs

Dominates in space, construction, query time

🔎 Similar Papers

BinomialHash: A Constant Time, Minimal Memory Consistent Hash Algorithm