🤖 AI Summary
To address key challenges in the retrieval stage of large-scale recommendation systems—namely, weak interaction modeling in embedding-based retrieval, representation-index inconsistency, and poor adaptability to user/item distribution shifts—this paper proposes the Hierarchical Structured Neural Network (HSNN). HSNN introduces a novel joint optimization framework integrating Modular Neural Networks (MoNN) with learnable hierarchical approximate nearest neighbor (ANN) indexing, thereby overcoming the limitations of dot-product similarity and enabling sublinear-complexity retrieval with continuous online adaptation. Through end-to-end joint training and a continuous online learning mechanism, HSNN achieves high-accuracy, low-latency retrieval over billion-scale item catalogs. Offline evaluations demonstrate a 3.2× speedup in retrieval latency and an average 4.7% AUC improvement over state-of-the-art methods, establishing significant gains in both efficiency and effectiveness.
📝 Abstract
Retrieval, the initial stage of a recommendation system, is tasked with down-selecting items from a pool of tens of millions of candidates to a few thousands. Embedding Based Retrieval (EBR) has been a typical choice for this problem, addressing the computational demands of deep neural networks across vast item corpora. EBR utilizes Two Tower or Siamese Networks to learn representations for users and items, and employ Approximate Nearest Neighbor (ANN) search to efficiently retrieve relevant items. Despite its popularity in industry, EBR faces limitations. The Two Tower architecture, relying on a single dot product interaction, struggles to capture complex data distributions due to limited capability in learning expressive interactions between users and items. Additionally, ANN index building and representation learning for user and item are often separate, leading to inconsistencies exacerbated by representation (e.g. continuous online training) and item drift (e.g. items expired and new items added). In this paper, we introduce the Hierarchical Structured Neural Network (HSNN), an efficient deep neural network model to learn intricate user and item interactions beyond the commonly used dot product in retrieval tasks, achieving sublinear computational costs relative to corpus size. A Modular Neural Network (MoNN) is designed to maintain high expressiveness for interaction learning while ensuring efficiency. A mixture of MoNNs operate on a hierarchical item index to achieve extensive computation sharing, enabling it to scale up to large corpus size. MoNN and the hierarchical index are jointly learnt to continuously adapt to distribution shifts in both user interests and item distributions. HSNN achieves substantial improvement in offline evaluation compared to prevailing methods.