🤖 AI Summary
This work investigates hypothesis identification and support set generation under supervision provided solely by unordered contrastive pairs—samples known to have different labels but without explicit positive or negative annotations. By introducing a geometric characterization of contrastively identifiable classes and the notion of contrastive closure dimension, the paper establishes novel learnability conditions and sample complexity measures within the framework of extreme learning. A unified model based on a common cross-graph is proposed to capture sample ambiguity, generation barriers, and contamination defects. Through combinatorial dimension analysis, the study reveals a fundamental incomparability between contrastive generation and traditional classification. The theoretical results fully characterize sample complexity in the noiseless setting and yield a single-algorithm identification method robust to arbitrary finite adversarial contamination, significantly outperforming positive-unlabeled learning in robustness.
📝 Abstract
In the classical identification in the limit model of Gold [1967], a stream of positive examples is presented round by round, and the learner must eventually recover the target hypothesis. Recently, Kleinberg and Mullainathan [2024] introduced generation in the limit, where the learner instead must eventually output novel elements of the target's support. Both lines of work focus on positive-only or fully labeled data. Yet many natural supervision signals are inherently relational rather than singleton, which encode relationships between examples rather than labels of individual ones. We initiate the study of contrastive identification and generation in the limit, where the learner observes a contrastive presentation of data: a stream of unordered pairs $\{x,y\}$ satisfying $h(x)\ne h(y)$ for an unknown target binary hypothesis $h$, but which element is positive is hidden from the learner. We first present three results in the noiseless setting: an exact characterization of contrastive identifiable classes (a one-line geometric refinement of Angluin [1980]'s tell-tale condition), a combinatorial dimension called contrastive closure dimension (a contrasitive analogue of the closure dimension in Raman et al. [2025]) and exactly characterizing uniform contrastive generation with tight sample complexity, and a strict hierarchy in which contrastive generation and text identification are mutually incomparable. We then prove a sharp reversal under finite adversarial corruption: there exist classes identifiable from contrastive pairs under any finite corruption budget by a single budget-independent algorithm, yet not identifiable from positive examples under even one corrupted observation. The unifying technical object is the common crossing graph, which encodes pairwise ambiguity, family-level generation obstructions, and corruption defects in a single coverage-and-incidence language.