🤖 AI Summary
This work investigates whether supervised learning models can be trained and generalized without access to any ground-truth labels $y$. To this end, we propose the *$y$-free smooth operator* paradigm, which constructs models as smooth mappings $S(x)$ dependent solely on input features $x$, enabling fully label-free training. We provide the first theoretical guarantee that supervised models can achieve effective training without ground-truth labels. Furthermore, we introduce an unsupervised model selection criterion based on predictive distribution consistency, circumventing the conventional reliance on labeled data for cross-validation. Empirical evaluation on synthetic and real-world datasets demonstrates that linear/kernel ridge regression, spline smoothing, and neural networks—trained exclusively on random (i.e., meaningless) labels—achieve performance comparable to standard supervised learning and substantially surpass random guessing. These results empirically validate the core finding: ground-truth labels are not strictly necessary for effective supervised model training and generalization.
📝 Abstract
The success of unsupervised learning raises the question of whether also supervised models can be trained without using the information in the output $y$. In this paper, we demonstrate that this is indeed possible. The key step is to formulate the model as a smoother, i.e. on the form $hat{f}=Sy$, and to construct the smoother matrix $S$ independently of $y$, e.g. by training on random labels. We present a simple model selection criterion based on the distribution of the out-of-sample predictions and show that, in contrast to cross-validation, this criterion can be used also without access to $y$. We demonstrate on real and synthetic data that $y$-free trained versions of linear and kernel ridge regression, smoothing splines, and neural networks perform similarly to their standard, $y$-based, versions and, most importantly, significantly better than random guessing.