π€ AI Summary
This work addresses the challenge of enabling in-context learning for prediction tasks in relational databases without task-specific training, where predictive signals are scattered across multiple linked tables. The authors propose a general-purpose pipeline that automatically aggregates relevant relational information to construct enriched feature representations for target rows, thereby enabling off-the-shelf tabular foundation models to be applied directly. This approach represents the first effective extension of in-context learning to relational database settings, offering a simple, reproducible solution accompanied by an open-source, scikit-learnβstyle toolkit for ease of use. Empirical evaluations on benchmarks such as RelBench and 4DBInfer demonstrate substantial improvements over zero-shot baselines, with performance on certain tasks even surpassing that of specialized supervised models fine-tuned for those tasks.
π Abstract
Recent advances in tabular in-context learning (ICL) show that a single pretrained model can adapt to new prediction tasks from a small set of labeled examples, avoiding per-task training and heavy tuning. However, many real-world tasks live in relational databases, where predictive signal is spread across multiple linked tables rather than a single flat table. We show that tabular ICL can be extended to relational prediction with a simple recipe: automatically featurize each target row using relational aggregations over its linked records, materialize the resulting augmented table, and run an off-the-shelf tabular foundation model on it. We package this approach in \textit{RDBLearn} (https://github.com/HKUSHXLab/rdblearn), an easy-to-use toolkit with a scikit-learn-style estimator interface that makes it straightforward to swap different tabular ICL backends; a complementary agent-specific interface is provided as well. Across a broad collection of RelBench and 4DBInfer datasets, RDBLearn is the best-performing foundation model approach we evaluate, at times even outperforming strong supervised baselines trained or fine-tuned on each dataset.