OmniTabBench: Mapping the Empirical Frontiers of GBDTs, Neural Networks, and Foundation Models for Tabular Data at Scale

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing tabular data benchmarks are limited in scale and biased, making it difficult to systematically evaluate the relative performance of tree-based models, neural networks, and foundation models. This work proposes OmniTabBench—the first large-scale benchmark comprising 3,030 diverse tabular datasets—augmented with domain-specific annotations generated by large language models, alongside a unified evaluation framework and a decoupled meta-feature analysis system. Empirical results demonstrate that no single model consistently dominates across all scenarios; instead, model performance is significantly influenced by meta-features such as dataset size, feature types, and data distributions. This study provides fine-grained, actionable guidance for model selection in tabular learning tasks.

Technology Category

Application Category

📝 Abstract

While traditional tree-based ensemble methods have long dominated tabular tasks, deep neural networks and emerging foundation models have challenged this primacy, yet no consensus exists on a universally superior paradigm. Existing benchmarks typically contain fewer than 100 datasets, raising concerns about evaluation sufficiency and potential selection biases. To address these limitations, we introduce OmniTabBench, the largest tabular benchmark to date, comprising 3030 datasets spanning diverse tasks that are comprehensively collected from diverse sources and categorized by industry using large language models. We conduct an unprecedented large-scale empirical evaluation of state-of-the-art models from all model families on OmniTabBench, confirming the absence of a dominant winner. Furthermore, through a decoupled metafeature analysis, which examines individual properties such as dataset size, feature types, feature and target skewness/kurtosis, we elucidate conditions favoring specific model categories, providing clearer, more actionable guidance than prior compound-metric studies.

Problem

Research questions and friction points this paper is trying to address.

tabular data

model evaluation

benchmarking

GBDTs

foundation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

OmniTabBench

tabular data benchmarking

metafeature analysis