🤖 AI Summary
To address the memory bottleneck caused by large embedding tables for high-cardinality categorical features in deep recommendation systems, this paper proposes a graph-aware hashing method based on modularity-driven clustering over the user-item bipartite graph. Unlike conventional random or learned hashing approaches, our method is the first to incorporate graph clustering into embedding compression: it partitions semantically similar entities into interpretable, training-free groups by maximizing bipartite modularity, enabling parameter sharing across grouped embeddings. Theoretically, we establish an intrinsic connection between modularity optimization and message-passing mechanisms. Empirically, our approach reduces embedding table size by over 75% while achieving an average 101.52% gain in recall—substantially outperforming diverse hashing baselines—and demonstrates strong performance in both retrieval and CTR prediction tasks.
📝 Abstract
Deep recommender systems rely heavily on large embedding tables to handle high-cardinality categorical features such as user/item identifiers, and face significant memory constraints at scale. To tackle this challenge, hashing techniques are often employed to map multiple entities to the same embedding and thus reduce the size of the embedding tables. Concurrently, graph-based collaborative signals have emerged as powerful tools in recommender systems, yet their potential for optimizing embedding table reduction remains unexplored. This paper introduces GraphHash, the first graph-based approach that leverages modularity-based bipartite graph clustering on user-item interaction graphs to reduce embedding table sizes. We demonstrate that the modularity objective has a theoretical connection to message-passing, which provides a foundation for our method. By employing fast clustering algorithms, GraphHash serves as a computationally efficient proxy for message-passing during preprocessing and a plug-and-play graph-based alternative to traditional ID hashing. Extensive experiments show that GraphHash substantially outperforms diverse hashing baselines on both retrieval and click-through-rate prediction tasks. In particular, GraphHash achieves on average a 101.52% improvement in recall when reducing the embedding table size by more than 75%, highlighting the value of graph-based collaborative information for model reduction. Our code is available at https://github.com/snap-research/GraphHash.