From Schema to Signal: Retrieval-Augmented Modeling for Relational Data Analytics

📅 2026-05-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

179K/year
🤖 AI Summary
This work addresses the challenge of modeling multi-table structures and complex associations in relational databases by proposing a Retrieval-Augmented Modeling framework (RAM). Treating tuple attributes as textual tokens, RAM generates contextual documents via random walks and, for the first time, incorporates an information retrieval mechanism to uncover semantic relationships beyond the explicit schema graph. The framework introduces two novel retrieval-augmented strategies—ATRA and ETRA—that overcome the limitations of traditional methods relying solely on explicit joins. By integrating contrastive learning, graph neural networks, and hierarchical feature fusion, RAM achieves state-of-the-art performance across diverse prediction tasks on five real-world databases, significantly outperforming existing approaches.
📝 Abstract
Relational data stored in RDBMS is foundational to many real-world applications across domains such as e-commerce, finance, and sociality. While deep neural networks (DNNs) have achieved strong performance on tabular data with a single table, extending these models to relational databases is challenging due to the normalized multi-table structure and complex inter-table relationships. Existing approaches often rely strictly on schema-defined graphs, which overlook implicit semantic signals embedded in tuple attributes and suffer from rigid connectivity. In this work, we propose Retrieval-Augmented Modeling (RAM), a novel framework that combines graph structure with attribute semantics for relational data analytics. RAM treats tuple attributes as tokens and uses random walks to construct contextual documents, enabling the use of information retrieval techniques to estimate semantic relevance between tuples. Building on these documents, we introduce two retrieval-based augmentations: ATRA, which leverages intra-table relevance for contrastive learning, and ETRA, which links semantically related tuples across tables to enhance graph connectivity. Then, we propose a layer-wise model architecture tailored for relational data, which involves attribute embedding, feature integration, and graph aggregation layers to enable expressive and flexible representation learning. Extensive experiments on five real-world relational databases demonstrate that RAM consistently outperforms existing baselines in diverse prediction tasks, establishing a state-of-the-art for relational data analytics.
Problem

Research questions and friction points this paper is trying to address.

relational data
deep neural networks
schema
semantic signals
graph connectivity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Modeling
relational data analytics
semantic relevance
graph augmentation
contrastive learning