Incorporating Deep Learning Design in Database Queries

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work proposes a declarative relational deep learning paradigm that overcomes the limitations of traditional approaches, which require converting relational data into graph structures and processing them in external frameworks—leading to substantial engineering overhead and disconnection from the database. By introducing learnable vector embeddings as provenance annotations for tuples, the approach naturally extends SQL queries to jointly operate on both data and embeddings, enabling deep integration of neural networks with database query execution. The authors implement a prototype system, RelaNN, built on PyTorch and cuDF, which supports diverse models including graph convolutional networks, heterogeneous graph Transformers, and hypergraph neural networks. The system demonstrates competitive performance with concise code, validating the feasibility and efficiency of implementing advanced neural architectures directly within the database.

📝 Abstract

Deep learning over relational databases is conventionally realized by translating data into graph representations and applying graph-based neural networks within external frameworks. This round-trip between the database and external machine learning (ML) systems introduces non-trivial engineering overhead. In effect, these graph neural networks operate on tuple embeddings and manipulate them in ways that capture the interactions induced by relational joins. Given this natural correspondence, there is no fundamental reason why specifying a neural network over relational data should be substantially harder than querying it. We propose an approach that naturally integrates deep learning with database queries. The key idea is to associate each tuple with provenance, represented as a vector embedding with learnable parameters. Queries are lifted to operate jointly on data and embeddings, mapping input relations with embedded tuples to output relations with embedded tuples. This approach provides a declarative foundation for relational deep learning, facilitating integration with database systems, optimization, and wide adoption. We describe RelaNN, a proof-of-concept implementation of this approach built on top of PyTorch and cuDF. We illustrate the utility of RelaNN by implementing various graph-learning models, including graph convolutional networks, heterogeneous graph transformers, hypergraph neural networks and deep homomorphism networks. The simplicity of the programs and their competitive runtime performance demonstrate a concrete path toward making the implementation of state-of-the-art neural networks over databases as simple as writing a query.

Problem

Research questions and friction points this paper is trying to address.

relational databases

deep learning

graph neural networks

database queries

tuple embeddings

Innovation

Methods, ideas, or system contributions that make the work stand out.

relational deep learning

tuple embedding

declarative neural networks