RelAgent: LLM Agents as Data Scientists for Relational Learning

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Existing relational learning approaches—such as graph-based, tabular, and sequential models—often suffer from poor interpretability, computational inefficiency, or a lack of unified frameworks. To address these limitations, this work proposes RelAgent, an autonomous data scientist system built upon large language models (LLMs). RelAgent employs a two-stage architecture for efficient and interpretable relational prediction: in the search stage, an LLM agent autonomously generates human-readable SQL feature programs and selects a classical machine learning model; in the inference stage, predictions are made solely through deterministic SQL queries and the chosen model, without invoking the LLM. This design achieves intrinsic interpretability, computational efficiency, and native database scalability, enabling seamless deployment at scale within standard database systems.

📝 Abstract

Relational learning is a challenging problem that has motivated a wide range of approaches, including graph-based models (e.g., graph neural networks, graph transformers), tabular methods (e.g., tabular foundation models), and sequence-based approaches (e.g., large language models), each with its own advantages and limitations. We propose RelAgent, an LLM-based autonomous data scientist for relational learning, which operates in two phases. In the search phase, an LLM agent uses database, validation, and evaluation workspace tools to construct SQL feature programs and select a predictive model. In the inference phase, the resulting program is executed without further LLM calls. The final predictor consists of SQL queries and a classical model, enabling fast, deterministic, and intrinsically interpretable predictions: features are human-readable queries, and predictions depend only on the resulting query-defined feature map, enabling scalable deployment using standard database systems.

Problem

Research questions and friction points this paper is trying to address.

relational learning

graph-based models

tabular methods

sequence-based approaches

feature engineering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Relational Learning

LLM Agent

SQL Feature Programs