In-Context Positive-Unlabeled Learning

📅 2026-05-06

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the positive-unlabeled (PU) binary classification setting, where only positive and unlabeled samples are available, by proposing PUICL—a context-based learning method that requires neither fine-tuning nor task-specific training. PUICL introduces, for the first time, the in-context learning paradigm to PU learning, leveraging a Transformer pretrained on synthetic data generated via a structural causal model. During inference, it jointly embeds known positive and unlabeled instances as a single input block and performs classification through a single forward pass. Evaluated on 20 semi-synthetic PU benchmarks, PUICL significantly outperforms four state-of-the-art methods in terms of average AUC and accuracy, while achieving comparable F1 scores, thereby demonstrating its effectiveness and strong generalization capability.

📝 Abstract

Positive-unlabeled (PU) learning addresses binary classification when only a set of labeled positives is available alongside a pool of unlabeled samples drawn from a mixture of positives and negatives. Existing PU methods typically require dataset-specific training or iterative optimization, which limits their applicability when many tasks must be solved quickly or with little tuning. We introduce PUICL, a pretrained transformer that solves PU classification entirely through in-context learning. PUICL is pretrained on synthetic PU datasets generated from randomly instantiated structural causal models, exposing it to a wide range of feature-label relationships and class-prior configurations. At inference time, PUICL receives the labeled positives and the unlabeled samples as a single input and returns class probabilities for the unlabeled rows in one forward pass, with no gradient updates or per-task fitting. On 20 semi-synthetic PU benchmarks derived from the UCI Machine Learning Repository, OpenML, and scikit-learn, PUICL outperforms four standard PU learning baselines in average AUC and accuracy, and is competitive on F1-score. These results show that the in-context learning paradigm extends naturally beyond fully supervised tabular prediction to the semi-supervised PU setting.

Problem

Research questions and friction points this paper is trying to address.

Positive-Unlabeled Learning

In-Context Learning

Binary Classification

Semi-Supervised Learning

Tabular Data

Innovation

Methods, ideas, or system contributions that make the work stand out.

in-context learning

positive-unlabeled learning

pretrained transformer