HierarchicalKV: A GPU Hash Table with Cache Semantics for Continuous Online Embedding Storage

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses severe memory inefficiency in conventional GPU hash tables when embedding tables exceed the capacity of a single GPU’s high-bandwidth memory (HBM), as these structures retain all key-value pairs regardless of access patterns. To overcome this limitation, the authors propose HierarchicalKV—the first GPU hash table that treats caching semantics as a first-class operation. It replaces traditional dictionary semantics with a policy-driven eviction mechanism that either updates entries in place or rejects insertions, thereby avoiding costly rehashing and overflow failures. Key innovations include cache-line-aligned buckets, inline score-driven upserts, dynamic dual-bucket selection, three-level concurrency control, and a hierarchical key-value separation architecture. Evaluated on an NVIDIA H100 NVL, HierarchicalKV achieves up to 3.9 billion key-value operations per second, maintains load factors between 0.50 and 1.00 with less than 5% throughput variation, outperforms WarpCore by 1.4×, and surpasses indirect-addressing baselines by 2.6–9.4×, with integration already adopted in multiple open-source recommendation frameworks.

Technology Category

Application Category

📝 Abstract

Traditional GPU hash tables preserve every inserted key -- a dictionary assumption that wastes scarce High Bandwidth Memory (HBM) when embedding tables routinely exceed single-GPU capacity. We challenge this assumption with cache semantics, where policy-driven eviction is a first-class operation. We introduce HierarchicalKV (HKV), the first general-purpose GPU hash table library whose normal full-capacity operating contract is cache-semantic: each full-bucket upsert (update-or-insert) is resolved in place by eviction or admission rejection rather than by rehashing or capacity-induced failure. HKV co-designs four core mechanisms -- cache-line-aligned buckets, in-line score-driven upsert, score-based dynamic dual-bucket selection, and triple-group concurrency -- and uses tiered key-value separation as a scaling enabler beyond HBM. On an NVIDIA H100 NVL GPU, HKV achieves up to 3.9 billion key-value pairs per second (B-KV/s) find throughput, stable across load factors 0.50-1.00 (<5% variation), and delivers 1.4x higher find throughput than WarpCore (the strongest dictionary-semantic GPU baseline at lambda=0.50) and up to 2.6-9.4x over indirection-based GPU baselines. Since its open-source release in October 2022, HKV has been integrated into multiple open-source recommendation frameworks.

Problem

Research questions and friction points this paper is trying to address.

GPU hash table

embedding storage

cache semantics

High Bandwidth Memory

online embedding

Innovation

Methods, ideas, or system contributions that make the work stand out.

cache semantics

GPU hash table

embedding storage