LUCoS: Latent Unsupervised Context Selection for Tabular Foundation Models

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of context selection for tabular foundation models in unlabeled cold-start scenarios. The authors propose LUCoS, a novel approach that leverages an unsupervised Prior-Fitted Network (PFN) to construct a latent representation space and, for the first time, exploits the geometric structure of this space for context instance selection. By employing a medoid-based algorithm to identify representative samples, LUCoS overcomes the unreliability of distance metrics in the original feature space. Extensive experiments across 67 OpenML-CC18 datasets demonstrate that LUCoS significantly outperforms existing baselines in terms of average AUC, accuracy, and F1 score, thereby validating the critical role of latent representations in enhancing performance under low-label conditions.
📝 Abstract
Selecting which instances to label is a key challenge in low-label tabular learning. For recent Tabular Foundation Models such as TabPFN, context selection directly determines predictive performance. Supervised oracle experiments show that carefully chosen labeled context sets can strongly outperform random selection under the same labeling budget. However, the cold-start setting, where instances must be selected before any labels are available, has received little attention in the TFM literature. This problem is fundamentally geometric. In vision and language, foundation models induce embedding spaces where simple geometric selection methods are effective. In contrast, tabular instance selection has so far been performed predominantly in the original tabular space, which lacks a natural metric; heterogeneous types, mixed scales, and nonlinear interactions make raw-space distances unreliable for context construction, and original-space selection falls below random on the majority of datasets as the budget grows. We propose LUCoS (Latent Unsupervised Context Selection), which replaces raw-feature geometry with the latent geometry induced by embeddings from an unsupervised Prior-Fitted Network (PFN) and selects representative medoids as context. Evaluated on 67 OpenML-CC18 datasets across six low-label budgets, LUCoS ranks first under mean AUC, ACC, and F1, with conclusions stable across metrics and dataset-level robustness checks. A gain decomposition reveals a simple mechanism: at the smallest budgets, the main benefit comes from enforcing coverage; as the budget increases, the decisive factor becomes the representation space in which coverage is measured. LUCoS mitigates failures of original feature space selection, showing that reliable unsupervised context selection depends less on selector sophistication than on defining representativeness in a meaningful representation geometry.
Problem

Research questions and friction points this paper is trying to address.

tabular foundation models
context selection
cold-start
unsupervised instance selection
representation geometry
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Geometry
Unsupervised Context Selection
Tabular Foundation Models
Medoid Selection
Cold-Start Learning
🔎 Similar Papers
No similar papers found.
O
Oroel Ipas
Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI)
Guillermo Gomez-Trenado
Guillermo Gomez-Trenado
Postdoc Researcher, University of Granada
AIgenAISSLGPAI
R
Rocío Romero-Zaliz
Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), Department of Computer Science and Artificial Intelligence (DECSAI), Research Center in Information and Communication Technologies (CITIC), Instituto de Investigación Biosanitaria Ibs.GRANADA, University of Granada
I
Isaac Triguero
Department of Computer Science and Artificial Intelligence (DECSAI), Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI)