iLTM: Integrated Large Tabular Model

📅 2025-11-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In tabular data modeling, gradient-boosted decision trees (GBDTs) remain dominant in practice, while existing deep learning models struggle to simultaneously achieve strong generalization, parameter efficiency, and adaptability to high-dimensional inputs. This paper introduces iLTM—the first unified foundational model for tabular data that integrates tree-derived embeddings, dimension-agnostic representations, meta-trained hypernetworks, MLPs, and retrieval augmentation. Its core innovation is the first meta-trained hypernetwork that deeply unifies tree-based and deep architectures, enabling effective cross-task transfer and lightweight fine-tuning. Pretrained on over 1,800 classification datasets, iLTM achieves state-of-the-art performance on both classification and regression benchmarks with minimal fine-tuning—outperforming meticulously tuned GBDTs and leading deep tabular models. It significantly improves generalization across diverse domains and enhances deployment efficiency through reduced inference latency and parameter count.

Technology Category

Application Category

📝 Abstract
Tabular data underpins decisions across science, industry, and public services. Despite rapid progress, advances in deep learning have not fully carried over to the tabular domain, where gradient-boosted decision trees (GBDTs) remain a default choice in practice. We present iLTM, an integrated Large Tabular Model that unifies tree-derived embeddings, dimensionality-agnostic representations, a meta-trained hypernetwork, multilayer perceptrons (MLPs), and retrieval within a single architecture. Pretrained on more than 1,800 heterogeneous classification datasets, iLTM achieves consistently superior performance across tabular classification and regression tasks, from small datasets to large and high-dimensional tasks. After light fine-tuning, the meta-trained hypernetwork transfers to regression targets, matching or surpassing strong baselines. Extensive experiments show that iLTM outperforms well-tuned GBDTs and leading deep tabular models while requiring less task-specific tuning. By bridging the gap between tree-based and neural methods, iLTM offers a new framework for tabular foundation models for robust, adaptable, and scalable tabular learning.
Problem

Research questions and friction points this paper is trying to address.

Bridging performance gap between deep learning and tabular data domains
Unifying diverse architectures for robust tabular classification and regression
Reducing task-specific tuning requirements while surpassing traditional GBDT performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies tree embeddings and hypernetworks in architecture
Meta-trained hypernetwork transfers to regression targets
Outperforms gradient-boosted trees with less tuning required
🔎 Similar Papers
David Bonet
David Bonet
Stanford University
Artificial IntelligenceMachine LearningDeep LearningSignal ProcessingComputational Biology
M
Marçal Comajoan Cara
University of California, Berkeley
A
Alvaro Calafell
École Polytechnique
D
Daniel Mas Montserrat
Stanford University
Alexander G. Ioannidis
Alexander G. Ioannidis
Assistant Professor