🤖 AI Summary
This work addresses the efficient mapping of high-dimensional sparse ID features (e.g., user/item IDs) to low-dimensional continuous embeddings in recommender systems. We systematically survey mainstream embedding paradigms from 2015–2023, categorizing them into three families: collaborative filtering, self-supervised learning (contrastive and generative), and graph neural networks (e.g., node2vec). We propose the first unified taxonomy, revealing fundamental trade-offs among accuracy, generalization, and computational cost. Methodologically, we introduce three lightweight optimization directions—AutoML-based hyperparameter and architecture tuning, hash-based compression, and low-bit quantization—to bridge the gap between theoretical modeling and industrial deployment. Our contributions include: (i) a principled classification framework elucidating design principles and limitations of existing methods; (ii) novel lightweight techniques enabling scalable, memory-efficient, and latency-aware embedding generation; and (iii) comprehensive guidelines for developing efficient, deployable, and production-ready embedded recommendation systems.
📝 Abstract
Recommender systems have become an essential component of many online platforms, providing personalized recommendations to users. A crucial aspect is embedding techniques that coverts the high-dimensional discrete features, such as user and item IDs, into low-dimensional continuous vectors and can enhance the recommendation performance. Applying embedding techniques captures complex entity relationships and has spurred substantial research. In this survey, we provide an overview of the recent literature on embedding techniques in recommender systems. This survey covers embedding methods like collaborative filtering, self-supervised learning, and graph-based techniques. Collaborative filtering generates embeddings capturing user-item preferences, excelling in sparse data. Self-supervised methods leverage contrastive or generative learning for various tasks. Graph-based techniques like node2vec exploit complex relationships in network-rich environments. Addressing the scalability challenges inherent to embedding methods, our survey delves into innovative directions within the field of recommendation systems. These directions aim to enhance performance and reduce computational complexity, paving the way for improved recommender systems. Among these innovative approaches, we will introduce Auto Machine Learning (AutoML), hash techniques, and quantization techniques in this survey. We discuss various architectures and techniques and highlight the challenges and future directions in these aspects. This survey aims to provide a comprehensive overview of the state-of-the-art in this rapidly evolving field and serve as a useful resource for researchers and practitioners working in the area of recommender systems.