🤖 AI Summary
In tabular data modeling, gradient-boosted decision trees (GBDTs) remain dominant in practice, while existing deep learning models struggle to simultaneously achieve strong generalization, parameter efficiency, and adaptability to high-dimensional inputs. This paper introduces iLTM—the first unified foundational model for tabular data that integrates tree-derived embeddings, dimension-agnostic representations, meta-trained hypernetworks, MLPs, and retrieval augmentation. Its core innovation is the first meta-trained hypernetwork that deeply unifies tree-based and deep architectures, enabling effective cross-task transfer and lightweight fine-tuning. Pretrained on over 1,800 classification datasets, iLTM achieves state-of-the-art performance on both classification and regression benchmarks with minimal fine-tuning—outperforming meticulously tuned GBDTs and leading deep tabular models. It significantly improves generalization across diverse domains and enhances deployment efficiency through reduced inference latency and parameter count.
📝 Abstract
Tabular data underpins decisions across science, industry, and public services. Despite rapid progress, advances in deep learning have not fully carried over to the tabular domain, where gradient-boosted decision trees (GBDTs) remain a default choice in practice. We present iLTM, an integrated Large Tabular Model that unifies tree-derived embeddings, dimensionality-agnostic representations, a meta-trained hypernetwork, multilayer perceptrons (MLPs), and retrieval within a single architecture. Pretrained on more than 1,800 heterogeneous classification datasets, iLTM achieves consistently superior performance across tabular classification and regression tasks, from small datasets to large and high-dimensional tasks. After light fine-tuning, the meta-trained hypernetwork transfers to regression targets, matching or surpassing strong baselines. Extensive experiments show that iLTM outperforms well-tuned GBDTs and leading deep tabular models while requiring less task-specific tuning. By bridging the gap between tree-based and neural methods, iLTM offers a new framework for tabular foundation models for robust, adaptable, and scalable tabular learning.