Active In-Context Learning for Tabular Foundation Models

📅 2026-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of conventional active learning in the cold-start phase for tabular data, where unreliable model uncertainty estimates hinder effective sample selection. To overcome this limitation, the authors propose Tabular Active In-Context Learning (Tab-AICL), a novel framework that integrates in-context learning with active learning by leveraging the contextual reasoning capabilities of the pre-trained tabular foundation model TabPFN. Without updating model parameters, Tab-AICL iteratively refines the set of labeled examples within the context to achieve highly efficient query selection. Four new acquisition strategies are introduced: Margin-based uncertainty, Coreset diversity, a Hybrid approach combining both, and a scalable two-stage Proxy-Hybrid method. Experiments across 20 classification benchmarks demonstrate that Tab-AICL significantly outperforms gradient boosting baselines within the first 100 labeled samples, substantially improving sample efficiency during cold start as measured by normalized AULC.
📝 Abstract
Active learning (AL) reduces labeling cost by querying informative samples, but in tabular settings its cold-start gains are often limited because uncertainty estimates are unreliable when models are trained on very few labels. Tabular foundation models such as TabPFN provide calibrated probabilistic predictions via in-context learning (ICL), i.e., without task-specific weight updates, enabling an AL regime in which the labeled context - rather than parameters - is iteratively optimized. We formalize Tabular Active In-Context Learning (Tab-AICL) and instantiate it with four acquisition rules: uncertainty (TabPFN-Margin), diversity (TabPFN-Coreset), an uncertainty-diversity hybrid (TabPFN-Hybrid), and a scalable two-stage method (TabPFN-Proxy-Hybrid) that shortlists candidates using a lightweight linear proxy before TabPFN-based selection. Across 20 classification benchmarks, Tab-AICL improves cold-start sample efficiency over retrained gradient-boosting baselines (CatBoost-Margin and XGBoost-Margin), measured by normalized AULC up to 100 labeled samples.
Problem

Research questions and friction points this paper is trying to address.

active learning
tabular data
cold-start
labeling cost
in-context learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Active In-Context Learning
Tabular Foundation Models
In-Context Learning
Sample Efficiency
Uncertainty-Diversity Acquisition
🔎 Similar Papers
No similar papers found.
W
Wilailuck Treerath
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milano, Italy
Fabrizio Pittorino
Fabrizio Pittorino
Politecnico di Milano
Machine LearningStatistical PhysicsComputational Neuroscience