Strategic PAC Learnability via Geometric Definability

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the challenge that strategic classification poses to PAC learnability, as individuals may strategically modify their features, thereby inflating the sample complexity of the induced hypothesis class. To tackle this issue, the authors introduce a geometric definability assumption, which uniformly characterizes both hypothesis classes and natural cost structures—such as ℓₚ or Wasserstein distances—using first-order logical formulas over the real field augmented with exponentiation. By integrating tools from model theory, VC dimension theory, and real algebraic geometry, they establish, for the first time under general conditions, that when a hypothesis class satisfies geometric definability, the sample complexity of its strategically induced counterpart is governed by the complexity of its defining formula, thereby restoring PAC learnability.

📝 Abstract

Strategic classification studies learning settings in which individuals can modify their features, at a cost, in order to influence the classifier's decision. A central question is how the sample complexity of the induced (strategic) hypothesis class depends on the complexities of the underlying hypothesis class and the cost structure governing feasible manipulations. Prior work has shown that in several natural settings, such as linear classifiers with norm costs, the induced complexity can be controlled. We begin by showing that such guarantees fail in general - even in simple cases: there exist hypothesis classes of VC dimension $1$ on the real line such that, even under the simplest interval neighborhoods, the induced class has infinite VC dimension. Thus, strategic behavior can turn an easy learning problem into a non-learnable one. To overcome this, we introduce structure via a geometric definability assumption: both the hypothesis class and the cost-induced neighborhood relation can be defined by first-order formulas over $\mathbb{R}_{\mathtt{exp}}$. Intuitively, this means that hypotheses and costs can be described using arithmetic operations, exponentiation, logarithms, and comparisons. This captures a broad range of natural classes and cost functions, including $\ell_p$ distances, Wasserstein distance, and information-theoretic divergences. Under this assumption, we prove that learnability is preserved, with sample complexity controlled by the complexity of the defining formulas.

Problem

Research questions and friction points this paper is trying to address.

strategic classification

sample complexity

VC dimension

learnability

geometric definability

Innovation

Methods, ideas, or system contributions that make the work stand out.

strategic classification

geometric definability

PAC learnability