🤖 AI Summary
The relational learning community lacks high-quality, standardized multi-relational benchmark datasets. Method: We construct the first systematic, open-source repository of multi-relational datasets, integrating 148 real-world SQL databases (in MySQL format) with diverse table schemas and relationship patterns. We propose a unified metadata modeling framework encompassing standardized schema descriptions, self-relationship statistics, and semantic annotations. Additionally, we provide a searchable web interface to support efficient dataset discovery and empirical evaluation. Contribution/Results: This repository fills a critical gap in benchmarking for relational learning, graph neural networks, and inductive logic programming—domains previously constrained by synthetic or narrow-scale data. It enables rigorous, reproducible model development and algorithmic assessment on authentic, heterogeneous relational structures. The resource has been widely adopted in both academic research and industrial applications for training and evaluating relational AI systems.
📝 Abstract
The aim of the Prague Relational Learning Repository is to support machine learning research with multi-relational data. The repository currently contains 148 SQL databases hosted on a public MySQL server located at https://relational.fel.cvut.cz. The server is provided by the Czech Technical University (CTU). A searchable meta-database provides metadata (e.g., the number of tables in the database, the number of rows and columns in the tables, the number of self-relationships).