In-Context Positive-Unlabeled Learning

πŸ“… 2026-05-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

184K/year
πŸ€– AI Summary
This work addresses the positive-unlabeled (PU) binary classification setting, where only positive and unlabeled samples are available, by proposing PUICLβ€”a context-based learning method that requires neither fine-tuning nor task-specific training. PUICL introduces, for the first time, the in-context learning paradigm to PU learning, leveraging a Transformer pretrained on synthetic data generated via a structural causal model. During inference, it jointly embeds known positive and unlabeled instances as a single input block and performs classification through a single forward pass. Evaluated on 20 semi-synthetic PU benchmarks, PUICL significantly outperforms four state-of-the-art methods in terms of average AUC and accuracy, while achieving comparable F1 scores, thereby demonstrating its effectiveness and strong generalization capability.
πŸ“ Abstract
Positive-unlabeled (PU) learning addresses binary classification when only a set of labeled positives is available alongside a pool of unlabeled samples drawn from a mixture of positives and negatives. Existing PU methods typically require dataset-specific training or iterative optimization, which limits their applicability when many tasks must be solved quickly or with little tuning. We introduce PUICL, a pretrained transformer that solves PU classification entirely through in-context learning. PUICL is pretrained on synthetic PU datasets generated from randomly instantiated structural causal models, exposing it to a wide range of feature-label relationships and class-prior configurations. At inference time, PUICL receives the labeled positives and the unlabeled samples as a single input and returns class probabilities for the unlabeled rows in one forward pass, with no gradient updates or per-task fitting. On 20 semi-synthetic PU benchmarks derived from the UCI Machine Learning Repository, OpenML, and scikit-learn, PUICL outperforms four standard PU learning baselines in average AUC and accuracy, and is competitive on F1-score. These results show that the in-context learning paradigm extends naturally beyond fully supervised tabular prediction to the semi-supervised PU setting.
Problem

Research questions and friction points this paper is trying to address.

Positive-Unlabeled Learning
In-Context Learning
Binary Classification
Semi-Supervised Learning
Tabular Data
Innovation

Methods, ideas, or system contributions that make the work stand out.

in-context learning
positive-unlabeled learning
pretrained transformer
synthetic data generation
zero-shot classification