🤖 AI Summary
This paper investigates the learnability of classifiers under strategic self-improvement by agents (e.g., job seekers upskilling). Addressing the Euclidean ball improvement set and generative data assumptions, it introduces— for the first time—the asymmetric minimal consistent hypothesis class variant to precisely characterize *adapting* learning. It also establishes the first complete theoretical characterization of *non-adapting* learning, resolving multiple open problems posed by Attias et al. Within both PAC and mistake-bound frameworks, the work integrates geometric modeling with bounded-noise analysis to derive tight error bounds for realizable and agnostic online learning. Under natural improvement assumptions, it significantly reduces generalization error. Furthermore, it provides the first theoretical guarantee on the robustness of conservative classifiers in dynamic strategic environments.
📝 Abstract
Machine learning is now ubiquitous in societal decision-making, for example in evaluating job candidates or loan applications, and it is increasingly important to take into account how classified agents will react to the learning algorithms. The majority of recent literature on strategic classification has focused on reducing and countering deceptive behaviors by the classified agents, but recent work of Attias et al. identifies surprising properties of learnability when the agents genuinely improve in order to attain the desirable classification, such as smaller generalization error than standard PAC-learning. In this paper we characterize so-called learnability with improvements across multiple new axes. We introduce an asymmetric variant of minimally consistent concept classes and use it to provide an exact characterization of proper learning with improvements in the realizable setting. While prior work studies learnability only under general, arbitrary agent improvement regions, we give positive results for more natural Euclidean ball improvement sets. In particular, we characterize improper learning under a mild generative assumption on the data distribution. We further show how to learn in more challenging settings, achieving lower generalization error under well-studied bounded noise models and obtaining mistake bounds in realizable and agnostic online learning. We resolve open questions posed by Attias et al. for both proper and improper learning.