🤖 AI Summary
This work addresses the inherent tension between embedding expressiveness and serving efficiency in the candidate generation stage of recommender systems. The authors propose a novel training strategy that replaces conventional dense embeddings with high-dimensional sparse embeddings in the collaborative filtering autoencoder ELSA, enabling efficient learning of sparse representations in candidate retrieval models for the first time. The approach reduces embedding size by an order of magnitude without sacrificing accuracy—achieving only a 2.5% performance drop even under 100× compression—and simultaneously reveals an interpretable inverted index structure aligned with latent semantics. This structure naturally supports seamless integration of segment-level recommendations, such as those required for 2D homepage layouts, thereby jointly optimizing efficiency, representational capacity, and interpretability.
📝 Abstract
Behavioral patterns captured in embeddings learned from interaction data are pivotal across various stages of production recommender systems. However, in the initial retrieval stage, practitioners face an inherent tradeoff between embedding expressiveness and the scalability and latency of serving components, resulting in the need for representations that are both compact and expressive. To address this challenge, we propose a training strategy for learning high-dimensional sparse embedding layers in place of conventional dense ones, balancing efficiency, representational expressiveness, and interpretability. To demonstrate our approach, we modified the production-grade collaborative filtering autoencoder ELSA, achieving up to 10x reduction in embedding size with no loss of recommendation accuracy, and up to 100x reduction with only a 2.5% loss. Moreover, the active embedding dimensions reveal an interpretable inverted-index structure that segments items in a way directly aligned with the model's latent space, thereby enabling integration of segment-level recommendation functionality (e.g., 2D homepage layouts) within the candidate retrieval model itself. Source codes, additional results, as well as a live demo are available at https://github.com/zombak79/compressed_elsa