🤖 AI Summary
This study investigates the behavioral interpretability of deep neural networks (DNNs) on tabular data and examines whether dataset meta-features can predict their training dynamics and performance. Method: We conduct a systematic evaluation of 32 deep and tree-based models across 300+ heterogeneous tabular datasets, integrating large-scale benchmarking, meta-feature modeling, cross-dataset performance attribution, and statistically robust analysis. Contribution/Results: First, we observe that high-performing models are strongly concentrated within a few architectural families. Second, we introduce a novel, interpretable paradigm for predicting DNN convergence behavior directly from dataset meta-features. Third, we identify a high-quality, stable subset of benchmarks exhibiting consistent model ranking and strong generalization across datasets. Empirically, while relative model rankings vary across datasets, top-performing methods converge markedly, and key data characteristics exhibit significant predictive power for DNN training behavior—enabling principled, meta-feature-driven model selection and analysis.
📝 Abstract
Tabular data is prevalent across diverse domains in machine learning. While classical methods like tree-based models have long been effective, Deep Neural Network (DNN)-based methods have recently demonstrated promising performance. However, the diverse characteristics of methods and the inherent heterogeneity of tabular datasets make understanding and interpreting tabular methods both challenging and prone to unstable observations. In this paper, we conduct in-depth evaluations and comprehensive analyses of tabular methods, with a particular focus on DNN-based models, using a benchmark of over 300 tabular datasets spanning a wide range of task types, sizes, and domains. First, we perform an extensive comparison of 32 state-of-the-art deep and tree-based methods, evaluating their average performance across multiple criteria. Although method ranks vary across datasets, we empirically find that top-performing methods tend to concentrate within a small subset of tabular models, regardless of the criteria used. Next, we investigate whether the training dynamics of deep tabular models can be predicted based on dataset properties. This approach not only offers insights into the behavior of deep tabular methods but also identifies a core set of"meta-features"that reflect dataset heterogeneity. The other subset includes datasets where method ranks are consistent with the overall benchmark, acting as a reliable probe for further tabular analysis.