LUCoS: Latent Unsupervised Context Selection for Tabular Foundation Models

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This work addresses the challenge of context selection for tabular foundation models in unlabeled cold-start scenarios. The authors propose LUCoS, a novel approach that leverages an unsupervised Prior-Fitted Network (PFN) to construct a latent representation space and, for the first time, exploits the geometric structure of this space for context instance selection. By employing a medoid-based algorithm to identify representative samples, LUCoS overcomes the unreliability of distance metrics in the original feature space. Extensive experiments across 67 OpenML-CC18 datasets demonstrate that LUCoS significantly outperforms existing baselines in terms of average AUC, accuracy, and F1 score, thereby validating the critical role of latent representations in enhancing performance under low-label conditions.

📝 Abstract

Selecting which instances to label is a key challenge in low-label tabular learning. For recent Tabular Foundation Models such as TabPFN, context selection directly determines predictive performance. Supervised oracle experiments show that carefully chosen labeled context sets can strongly outperform random selection under the same labeling budget. However, the cold-start setting, where instances must be selected before any labels are available, has received little attention in the TFM literature. This problem is fundamentally geometric. In vision and language, foundation models induce embedding spaces where simple geometric selection methods are effective. In contrast, tabular instance selection has so far been performed predominantly in the original tabular space, which lacks a natural metric; heterogeneous types, mixed scales, and nonlinear interactions make raw-space distances unreliable for context construction, and original-space selection falls below random on the majority of datasets as the budget grows. We propose LUCoS (Latent Unsupervised Context Selection), which replaces raw-feature geometry with the latent geometry induced by embeddings from an unsupervised Prior-Fitted Network (PFN) and selects representative medoids as context. Evaluated on 67 OpenML-CC18 datasets across six low-label budgets, LUCoS ranks first under mean AUC, ACC, and F1, with conclusions stable across metrics and dataset-level robustness checks. A gain decomposition reveals a simple mechanism: at the smallest budgets, the main benefit comes from enforcing coverage; as the budget increases, the decisive factor becomes the representation space in which coverage is measured. LUCoS mitigates failures of original feature space selection, showing that reliable unsupervised context selection depends less on selector sophistication than on defining representativeness in a meaningful representation geometry.

Problem

Research questions and friction points this paper is trying to address.

tabular foundation models

context selection

cold-start

unsupervised instance selection

representation geometry

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Geometry

Unsupervised Context Selection

Tabular Foundation Models