🤖 AI Summary
This study investigates whether dataset meta-features can explain performance differences among models on tabular data. Leveraging the large-scale TabArena benchmark, it presents the first systematic evaluation of the explanatory power of meta-features regarding performance gaps across model families, combining statistical hypothesis testing—with false discovery rate control—and leave-one-dataset-out extrapolation. The findings reveal that existing meta-features struggle to robustly account for performance disparities between neural networks and tree-based models. Only limited and context-specific associations are identified for certain model pairs, and overall predictive performance does not significantly surpass that of simple baselines. This work thus provides empirical evidence on the limitations of current meta-features in meta-learning and automated machine learning contexts.
📝 Abstract
With the rise of tabular foundation models alongside traditional models still performing well on many tasks, choosing the right model for a tabular dataset remains difficult. We investigate whether dataset meta-features can explain performance gaps between model families on tabular prediction tasks. Using the TabArena benchmark results, we analyze dataset-level performance gaps and relate them to model-agnostic dataset descriptors. After strict statistical tests with false discovery control, we find that (1) for neural network vs. tree gaps, no meta-feature survives false discovery control, (2) for non-foundation vs. foundation model gaps, one association is robust but does not generalize when tested in leave-one-dataset-out prediction, and (3) for TabICLv2 vs. TabPFN-2.6, one robust association also improves held-out prediction. Furthermore, we conduct a leave-one-dataset-out analysis and find that meta-feature predictors fail to improve meaningfully over a simple baseline. Overall, our results show the heterogeneity of tabular datasets and that global meta-feature approaches are not robust enough to offer explanations on the 51 TabArena datasets.