🤖 AI Summary
This paper investigates PAC learnability for binary classification under performative distribution shift—where model deployment alters the test distribution, causing standard training paradigms to fail. To address this, we construct an unbiased risk estimator whose form depends only on the original (pre-deployment) data and the structure of the performative shift, thereby enabling estimation of the true performative risk without access to shifted test samples. We prove that, under common shift models—including label-linear shifts and joint feature-label shifts—standard PAC-learnable hypothesis classes remain learnable via this estimator. Our theoretical analysis establishes generalization error bounds, and we design an optimization algorithm based on empirical risk minimization over the proposed unbiased risk. Experiments on synthetic and real-world datasets demonstrate that our method significantly improves test performance under performative shift compared to conventional approaches.
📝 Abstract
Following the wide-spread adoption of machine learning models in real-world applications, the phenomenon of performativity, i.e. model-dependent shifts in the test distribution, becomes increasingly prevalent. Unfortunately, since models are usually trained solely based on samples from the original (unshifted) distribution, this performative shift may lead to decreased test-time performance. In this paper, we study the question of whether and when performative binary classification problems are learnable, via the lens of the classic PAC (Probably Approximately Correct) learning framework. We motivate several performative scenarios, accounting in particular for linear shifts in the label distribution, as well as for more general changes in both the labels and the features. We construct a performative empirical risk function, which depends only on data from the original distribution and on the type performative effect, and is yet an unbiased estimate of the true risk of a classifier on the shifted distribution. Minimizing this notion of performative risk allows us to show that any PAC-learnable hypothesis space in the standard binary classification setting remains PAC-learnable for the considered performative scenarios. We also conduct an extensive experimental evaluation of our performative risk minimization method and showcase benefits on synthetic and real data.