Combined Search and Encoding for Seeds, with an Application to Minimal Perfect Hashing

📅 2025-02-08

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This work addresses the high storage overhead and low entropy efficiency of multiple independent seeds in randomized algorithms. We propose a novel joint optimization framework for seed sequence generation and encoding. For the first time, we identify substantial spatial redundancy in standard seed sequences and design an information-theoretic entropy-guided joint search-and-encoding mechanism, reducing seed sequence entropy by Ω(n) bits. Based on this, we construct a minimal perfect hash function achieving space complexity (1+ε)OPT—nearly matching the information-theoretic lower bound—with construction time O(n/ε). This improves upon the state-of-the-art by two orders of magnitude in speed. Our core innovation lies in abandoning the conventional assumption of seed independence: instead, we explicitly model inter-seed correlations and apply compact entropy-aware encoding, thereby significantly enhancing both space and time efficiency of randomized data structures.

Technology Category

Application Category

📝 Abstract

Randomised algorithms often employ methods that can fail and that are retried with independent randomness until they succeed. Randomised data structures therefore often store indices of successful attempts, called seeds. If $n$ such seeds are required (e.g., for independent substructures) the standard approach is to compute for each $i in [n]$ the smallest successful seed $S_i$ and store $vec{S} = (S_1, ldots, S_n)$. The central observation of this paper is that this is not space-optimal. We present a different algorithm that computes a sequence $vec{S}' = (S_1', ldots, S_n')$ of successful seeds such that the entropy of $vec{S'}$ undercuts the entropy of $vec{S}$ by $Omega(n)$ bits in most cases. To achieve a memory consumption of $mathrm{OPT}+varepsilon n$, the expected number of inspected seeds increases by a factor of $O(1/varepsilon)$. We demonstrate the usefulness of our findings with a novel construction for minimal perfect hash functions with space requirement $(1+varepsilon)mathrm{OPT}$. The construction time is $O(n/varepsilon)$ while all previous approaches have construction times that increase exponentially with $1/varepsilon$. Our implementation beats the construction throughput of the state of the art by up to two orders of magnitude.

Problem

Research questions and friction points this paper is trying to address.

Optimizes seed storage space in randomized algorithms

Reduces entropy in successful seed sequences

Enhances minimal perfect hash function construction efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combined seed search and encoding

Reduced entropy for seed storage

Efficient minimal perfect hashing construction

🔎 Similar Papers

BinomialHash: A Constant Time, Minimal Memory Consistent Hash Algorithm