Evaluation on Entity Matching in Recommender Systems

📅 2026-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of a rigorous evaluation framework for cross-dataset entity matching, which hinders the development of large language model–driven conversational recommendation systems and knowledge-enhanced datasets. To bridge this gap, we introduce Reddit-Amazon-EM, a novel dataset comprising human-annotated correspondences between movie entities from Reddit-Movies and Amazon'23, establishing the first high-quality gold standard for entity alignment across recommendation system datasets. We systematically evaluate a diverse set of matching approaches—including rule-based, graph-based, lexical, embedding-based, and large language model–based methods—and provide a reproducible benchmark. The annotated gold standard and the best-performing entity mappings are publicly released, offering a critical resource and foundational infrastructure for advancing entity alignment research in recommender systems.

Technology Category

Application Category

📝 Abstract
Entity matching is a crucial component in various recommender systems, including conversational recommender systems (CRS) and knowledge-based recommender systems. However, the lack of rigorous evaluation frameworks for cross-dataset entity matching impedes progress in areas such as LLM-driven conversational recommendations and knowledge-grounded dataset construction. In this paper, we introduce Reddit-Amazon-EM, a novel dataset comprising naturally occurring items from Reddit and the Amazon'23 dataset. Through careful manual annotation, we identify corresponding movies across Reddit-Movies and Amazon'23, two existing recommender system datasets with inherently overlapping catalogs. Leveraging Reddit-Amazon-EM, we conduct a comprehensive evaluation of state-of-the-art entity matching methods, including rule-based, graph-based, lexical-based, embedding-based, and LLM-based approaches. For reproducible research, we release our manually annotated entity matching gold set and provide the mapping between the two datasets using the best-performing method from our experiments. This serves as a valuable resource for advancing future work on entity matching in recommender systems.Data and Code are accessible at: https://github.com/huang-zihan/Reddit-Amazon-Entity-Matching.
Problem

Research questions and friction points this paper is trying to address.

entity matching
recommender systems
cross-dataset evaluation
conversational recommender systems
knowledge-based recommender systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

entity matching
recommender systems
cross-dataset evaluation
Reddit-Amazon-EM
LLM-based matching
🔎 Similar Papers
No similar papers found.