Learning from positive and unlabeled examples -Finite size sample bounds

📅 2025-07-09

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work investigates the statistical complexity of Positive-Unlabeled (PU) learning under unknown class-prior conditions. Unlike mainstream approaches that assume known class priors or strong sampling assumptions, we analyze PU learning in a more realistic setting where the positive class prior is unknown. We establish, for the first time, tight sample complexity bounds—both upper and lower—for the minimal number of positive and unlabeled samples required. Leveraging statistical learning theory and empirical process techniques, we rigorously derive these bounds and quantify their dependence on the (unknown) class prior, classifier complexity, and distribution shift. Our results demonstrate that PU learning remains statistically learnable even without prior knowledge, with sample requirements exceeding those of supervised learning only by a logarithmic factor. This work provides the first prior-free theoretical foundation for PU learning, substantially enhancing its applicability and practical guidance in real-world scenarios such as medical screening and anomaly detection.

Technology Category

Application Category

📝 Abstract

PU (Positive Unlabeled) learning is a variant of supervised classification learning in which the only labels revealed to the learner are of positively labeled instances. PU learning arises in many real-world applications. Most existing work relies on the simplifying assumptions that the positively labeled training data is drawn from the restriction of the data generating distribution to positively labeled instances and/or that the proportion of positively labeled points (a.k.a. the class prior) is known apriori to the learner. This paper provides a theoretical analysis of the statistical complexity of PU learning under a wider range of setups. Unlike most prior work, our study does not assume that the class prior is known to the learner. We prove upper and lower bounds on the required sample sizes (of both the positively labeled and the unlabeled samples).

Problem

Research questions and friction points this paper is trying to address.

Analyzes PU learning without known class prior

Estimates sample size bounds for positive and unlabeled data

Expands theoretical understanding of PU learning setups

Innovation

Methods, ideas, or system contributions that make the work stand out.

PU learning without known class prior

Sample size bounds for PU learning

Theoretical analysis of PU learning setups

🔎 Similar Papers

Understanding Contrastive Representation Learning from Positive Unlabeled (PU) Data