GraphHash: Graph Clustering Enables Parameter Efficiency in Recommender Systems

📅 2024-12-23

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address the memory bottleneck caused by large embedding tables for high-cardinality categorical features in deep recommendation systems, this paper proposes a graph-aware hashing method based on modularity-driven clustering over the user-item bipartite graph. Unlike conventional random or learned hashing approaches, our method is the first to incorporate graph clustering into embedding compression: it partitions semantically similar entities into interpretable, training-free groups by maximizing bipartite modularity, enabling parameter sharing across grouped embeddings. Theoretically, we establish an intrinsic connection between modularity optimization and message-passing mechanisms. Empirically, our approach reduces embedding table size by over 75% while achieving an average 101.52% gain in recall—substantially outperforming diverse hashing baselines—and demonstrates strong performance in both retrieval and CTR prediction tasks.

Technology Category

Application Category

📝 Abstract

Deep recommender systems rely heavily on large embedding tables to handle high-cardinality categorical features such as user/item identifiers, and face significant memory constraints at scale. To tackle this challenge, hashing techniques are often employed to map multiple entities to the same embedding and thus reduce the size of the embedding tables. Concurrently, graph-based collaborative signals have emerged as powerful tools in recommender systems, yet their potential for optimizing embedding table reduction remains unexplored. This paper introduces GraphHash, the first graph-based approach that leverages modularity-based bipartite graph clustering on user-item interaction graphs to reduce embedding table sizes. We demonstrate that the modularity objective has a theoretical connection to message-passing, which provides a foundation for our method. By employing fast clustering algorithms, GraphHash serves as a computationally efficient proxy for message-passing during preprocessing and a plug-and-play graph-based alternative to traditional ID hashing. Extensive experiments show that GraphHash substantially outperforms diverse hashing baselines on both retrieval and click-through-rate prediction tasks. In particular, GraphHash achieves on average a 101.52% improvement in recall when reducing the embedding table size by more than 75%, highlighting the value of graph-based collaborative information for model reduction. Our code is available at https://github.com/snap-research/GraphHash.

Problem

Research questions and friction points this paper is trying to address.

Reduces embedding table sizes

Leverages bipartite graph clustering

Improves recall in recommender systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based bipartite clustering

Modularity objective connection

Efficient embedding table reduction

🔎 Similar Papers

GraphEx: A Graph-based Extraction Method for Advertiser Keyphrase Recommendation