Proper Learnability and the Role of Unlabeled Data

📅 2025-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates proper learning—where the learner must output a predictor from a prescribed hypothesis class—under known data distributions. Within the distribution-specific PAC framework, we establish, for the first time, the existence of an optimal properly learning algorithm based on distributional regularization. We further demonstrate that proper learnability is undecidable, non-monotonic, and non-local in the realizable PAC setting—challenging conventional wisdom. Using techniques from ZFC set theory, reductions from the EMX problem, and metric loss theory, we derive sufficient conditions for proper learnability and prove that its sample complexity is at most a logarithmic factor smaller than that of classical (improper) PAC learning. Moreover, we present the first fundamental impossibility result for proper learnability in multiclass classification, showing that certain natural hypothesis classes—while improperly learnable—are provably not properly learnable under any finite sample size.

Technology Category

Application Category

📝 Abstract
Proper learning refers to the setting in which learners must emit predictors in the underlying hypothesis class $H$, and often leads to learners with simple algorithmic forms (e.g. empirical risk minimization (ERM), structural risk minimization (SRM)). The limitation of proper learning, however, is that there exist problems which can only be learned improperly, e.g. in multiclass classification. Thus, we ask: Under what assumptions on the hypothesis class or the information provided to the learner is a problem properly learnable? We first demonstrate that when the unlabeled data distribution is given, there always exists an optimal proper learner governed by distributional regularization, a randomized generalization of regularization. We refer to this setting as the distribution-fixed PAC model, and continue to evaluate the learner on its worst-case performance over all distributions. Our result holds for all metric loss functions and any finite learning problem (with no dependence on its size). Further, we demonstrate that sample complexities in the distribution-fixed PAC model can shrink by only a logarithmic factor from the classic PAC model, strongly refuting the role of unlabeled data in PAC learning (from a worst-case perspective). We complement this with impossibility results which obstruct any characterization of proper learnability in the realizable PAC model. First, we observe that there are problems whose proper learnability is logically undecidable, i.e., independent of the ZFC axioms. We then show that proper learnability is not a monotone property of the underlying hypothesis class, and that it is not a local property (in a precise sense). Our impossibility results all hold even for the fundamental setting of multiclass classification, and go through a reduction of EMX learning (Ben-David et al., 2019) to proper classification which may be of independent interest.
Problem

Research questions and friction points this paper is trying to address.

Determining conditions for proper learnability in hypothesis classes.
Role of unlabeled data in PAC learning efficiency.
Impossibility of characterizing proper learnability in realizable PAC model.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributional regularization optimizes proper learning
Distribution-fixed PAC model evaluates worst-case performance
Proper learnability challenges ZFC axioms in classification
🔎 Similar Papers
No similar papers found.