Fine-tuned In-Context Learning Transformers are Excellent Tabular Data Classifiers

📅 2024-05-22
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient classification robustness of tabular data under out-of-distribution (OOD) and few-shot settings, this paper proposes TabForest and TabForestPFN. First, we design TabForest—a synthetic forest data generator explicitly targeting decision boundary complexity—to pretrain in-context learning (ICL) transformers. Building upon this, we introduce TabForestPFN, which synergistically integrates zero-shot generalization with supervised fine-tuning. Crucially, this work is the first to demonstrate that ICL-transformers, when fine-tuned, can effectively model complex nonlinear decision boundaries—previously thought infeasible for such architectures. Experiments across multiple real-world tabular benchmarks show that TabForestPFN achieves significantly superior fine-tuned performance over TabPFN while preserving strong zero-shot capabilities. Moreover, it substantially improves classification robustness under both OOD and few-shot conditions.

Technology Category

Application Category

📝 Abstract
The recently introduced TabPFN pretrains an In-Context Learning (ICL) transformer on synthetic data to perform tabular data classification. In this work, we extend TabPFN to the fine-tuning setting, resulting in a significant performance boost. We also discover that fine-tuning enables ICL-transformers to create complex decision boundaries, a property regular neural networks do not have. Based on this observation, we propose to pretrain ICL-transformers on a new forest dataset generator which creates datasets that are unrealistic, but have complex decision boundaries. TabForest, the ICL-transformer pretrained on this dataset generator, shows better fine-tuning performance when pretrained on more complex datasets. Additionally, TabForest outperforms TabPFN on some real-world datasets when fine-tuning, despite having lower zero-shot performance due to the unrealistic nature of the pretraining datasets. By combining both dataset generators, we create TabForestPFN, an ICL-transformer that achieves excellent fine-tuning performance and good zero-shot performance.
Problem

Research questions and friction points this paper is trying to address.

Enhanced recognition
Complex real-world data
Unseen data classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

TabPFN
TabForest
TabForestPFN
🔎 Similar Papers
No similar papers found.
F
Felix den Breejen
Graduate School of AI, Korean Advanced Institute of Science and Technology (KAIST)
Sangmin Bae
Sangmin Bae
Postdoc at KAIST AI || PhD at KAIST AI
Adaptive ComputationMultimodal Learning
S
Stephen Cha
Graduate School of AI, Korean Advanced Institute of Science and Technology (KAIST)
S
SeYoung Yun
Graduate School of AI, Korean Advanced Institute of Science and Technology (KAIST)