RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diverse relational database (RDB)-to-graph modeling approaches exist, yet the absence of a systematic evaluation benchmark impedes identification of optimal graph structures. Method: We introduce the first comprehensive benchmark for automatic RDB-to-graph modeling, establishing a standardized evaluation framework that encompasses five real-world databases, twelve downstream prediction tasks, and approximately 50,000 graph-structure–performance pairs. All graphs are uniformly constructed using foreign-key relationships to define nodes and edges; nine state-of-the-art automated graph construction methods are integrated, and graph structures are precomputed to ensure consistent training and evaluation. Contribution/Results: Our framework accelerates evaluation by 600× compared to on-the-fly graph construction. It empirically identifies critical structural patterns governing modeling efficacy—revealing, for the first time, actionable insights into optimal graph design—and provides a reproducible, evidence-based standard for RDB-to-graph modeling assessment.

Technology Category

Application Category

📝 Abstract
Relational databases (RDBs) are composed of interconnected tables, where relationships between them are defined through foreign keys. Recent research on applying machine learning to RDBs has explored graph-based representations of RDBs, where rows of tables are modeled as nodes, and foreign key relationships are modeled as edges. RDB-to-graph modeling helps capture cross-table dependencies, ultimately leading to enhanced performance across diverse tasks. However, there are numerous ways to model RDBs as graphs, and performance varies significantly depending on the chosen graph model. In our analysis, applying a common heuristic rule for graph modeling leads to up to a 10% drop in performance compared to the best-performing graph model, which remains non-trivial to identify. To foster research on intelligent RDB-to-graph modeling, we introduce RDB2G-Bench, the first benchmark framework for evaluating such methods. We construct extensive datasets covering 5 real-world RDBs and 12 predictive tasks, resulting in around 50k graph-performance pairs for efficient and reproducible evaluations. Thanks to our precomputed datasets, we were able to benchmark 9 automatic RDB-to-graph modeling methods on the 12 tasks over 600x faster than on-the-fly evaluation, which requires repeated model training. Our analysis of the datasets and benchmark results reveals key structural patterns affecting graph model effectiveness, along with practical implications for effective graph modeling.
Problem

Research questions and friction points this paper is trying to address.

Evaluating performance of various RDB-to-graph modeling methods
Identifying optimal graph models for relational databases
Providing benchmark framework for intelligent RDB-to-graph conversion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark framework for RDB-to-graph modeling
Precomputed datasets for efficient evaluations
Analysis of structural patterns in graph models
Dongwon Choi
Dongwon Choi
KAIST AI
Machine Learning
S
Sunwoo Kim
Kim Jaechul Graduate School of AI, KAIST
Juyeon Kim
Juyeon Kim
KAIST AI
Multimodal LearningInformation RetrievalLarge Language Models
Kyungho Kim
Kyungho Kim
KAIST AI
Recommender SystemsGraph Neural NetworksMachine Learning
G
Geon Lee
Kim Jaechul Graduate School of AI, KAIST
S
Shinhwan Kang
Kim Jaechul Graduate School of AI, KAIST
M
Myunghwan Kim
Kumo.AI
Kijung Shin
Kijung Shin
Associate Professor, KAIST
Data MiningGraph MiningNetwork Science