RDBLearn: Simple In-Context Prediction Over Relational Databases

📅 2026-02-14

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This work addresses the challenge of enabling in-context learning for prediction tasks in relational databases without task-specific training, where predictive signals are scattered across multiple linked tables. The authors propose a general-purpose pipeline that automatically aggregates relevant relational information to construct enriched feature representations for target rows, thereby enabling off-the-shelf tabular foundation models to be applied directly. This approach represents the first effective extension of in-context learning to relational database settings, offering a simple, reproducible solution accompanied by an open-source, scikit-learn–style toolkit for ease of use. Empirical evaluations on benchmarks such as RelBench and 4DBInfer demonstrate substantial improvements over zero-shot baselines, with performance on certain tasks even surpassing that of specialized supervised models fine-tuned for those tasks.

Technology Category

Application Category

📝 Abstract

Recent advances in tabular in-context learning (ICL) show that a single pretrained model can adapt to new prediction tasks from a small set of labeled examples, avoiding per-task training and heavy tuning. However, many real-world tasks live in relational databases, where predictive signal is spread across multiple linked tables rather than a single flat table. We show that tabular ICL can be extended to relational prediction with a simple recipe: automatically featurize each target row using relational aggregations over its linked records, materialize the resulting augmented table, and run an off-the-shelf tabular foundation model on it. We package this approach in \textit{RDBLearn} (https://github.com/HKUSHXLab/rdblearn), an easy-to-use toolkit with a scikit-learn-style estimator interface that makes it straightforward to swap different tabular ICL backends; a complementary agent-specific interface is provided as well. Across a broad collection of RelBench and 4DBInfer datasets, RDBLearn is the best-performing foundation model approach we evaluate, at times even outperforming strong supervised baselines trained or fine-tuned on each dataset.

Problem

Research questions and friction points this paper is trying to address.

relational databases

in-context learning

tabular prediction

multi-table prediction

foundation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

relational databases

in-context learning

tabular foundation models