🤖 AI Summary
Recommendation systems research suffers from low dataset discoverability and poor reproducibility due to fragmented datasets and heterogeneous metadata. To address this, we propose RecDatasetSearch—a community-driven, interpretable dataset search engine. Methodologically, it integrates structured metadata modeling with semantic retrieval powered by pretrained language models, enabling multi-attribute joint queries across dataset names, descriptions, and recommendation task domains. Crucially, it introduces an interpretability mechanism that provides fine-grained relevance attribution for search results and establishes an open, versioned, community-contributed metadata curation paradigm. Experimental evaluation demonstrates significant improvements in retrieval accuracy and transparency. The platform is fully open-sourced and publicly deployed, thereby enhancing reproducibility and fostering sustainable collaboration in recommendation research.
📝 Abstract
Accessing suitable datasets is critical for research and development in recommender systems. However, finding datasets that match specific recommendation task or domains remains a challenge due to scattered sources and inconsistent metadata. To address this gap, we propose a community-driven and explainable dataset search engine tailored for recommender system research. Our system supports semantic search across multiple dataset attributes, such as dataset names, descriptions, and recommendation domain, and provides explanations of search relevance to enhance transparency. The system encourages community participation by allowing users to contribute standardized dataset metadata in public repository. By improving dataset discoverability and search interpretability, the system facilitates more efficient research reproduction. The platform is publicly available at: https://ds4rs.com.