🤖 AI Summary
This work investigates the transferability of task-agnostic embeddings for tabular data, systematically evaluating generative foundation models (e.g., TabPFN, TabICL) against traditional feature engineering—specifically TableVectorizer—on downstream tasks including anomaly detection and supervised learning. The study finds that TableVectorizer, leveraging lightweight, hand-crafted feature transformations, produces high-quality universal embeddings that match or exceed the performance of large foundation models across multiple benchmarks, while achieving inference speeds three orders of magnitude faster. These results challenge the prevailing assumption that complex foundation models are inherently superior for generic tabular representation learning. Crucially, this is the first empirical demonstration that simple, interpretable, and computationally efficient feature engineering can be highly competitive in learning task-agnostic tabular embeddings. The work thus establishes a more practical, deployable paradigm for tabular representation learning—grounded in transparency, efficiency, and robust generalization.
📝 Abstract
Recent foundation models for tabular data achieve strong task-specific performance via in-context learning. Nevertheless, they focus on direct prediction by encapsulating both representation learning and task-specific inference inside a single, resource-intensive network. This work specifically focuses on representation learning, i.e., on transferable, task-agnostic embeddings. We systematically evaluate task-agnostic representations from tabular foundation models (TabPFN and TabICL) alongside with classical feature engineering (TableVectorizer) across a variety of application tasks as outlier detection (ADBench) and supervised learning (TabArena Lite). We find that simple TableVectorizer features achieve comparable or superior performance while being up to three orders of magnitude faster than tabular foundation models. The code is available at https://github.com/ContactSoftwareAI/TabEmbedBench.