Strategic PAC Learnability via Geometric Definability

📅 2026-05-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
This work addresses the challenge that strategic classification poses to PAC learnability, as individuals may strategically modify their features, thereby inflating the sample complexity of the induced hypothesis class. To tackle this issue, the authors introduce a geometric definability assumption, which uniformly characterizes both hypothesis classes and natural cost structures—such as ℓₚ or Wasserstein distances—using first-order logical formulas over the real field augmented with exponentiation. By integrating tools from model theory, VC dimension theory, and real algebraic geometry, they establish, for the first time under general conditions, that when a hypothesis class satisfies geometric definability, the sample complexity of its strategically induced counterpart is governed by the complexity of its defining formula, thereby restoring PAC learnability.
📝 Abstract
Strategic classification studies learning settings in which individuals can modify their features, at a cost, in order to influence the classifier's decision. A central question is how the sample complexity of the induced (strategic) hypothesis class depends on the complexities of the underlying hypothesis class and the cost structure governing feasible manipulations. Prior work has shown that in several natural settings, such as linear classifiers with norm costs, the induced complexity can be controlled. We begin by showing that such guarantees fail in general - even in simple cases: there exist hypothesis classes of VC dimension $1$ on the real line such that, even under the simplest interval neighborhoods, the induced class has infinite VC dimension. Thus, strategic behavior can turn an easy learning problem into a non-learnable one. To overcome this, we introduce structure via a geometric definability assumption: both the hypothesis class and the cost-induced neighborhood relation can be defined by first-order formulas over $\mathbb{R}_{\mathtt{exp}}$. Intuitively, this means that hypotheses and costs can be described using arithmetic operations, exponentiation, logarithms, and comparisons. This captures a broad range of natural classes and cost functions, including $\ell_p$ distances, Wasserstein distance, and information-theoretic divergences. Under this assumption, we prove that learnability is preserved, with sample complexity controlled by the complexity of the defining formulas.
Problem

Research questions and friction points this paper is trying to address.

strategic classification
sample complexity
VC dimension
learnability
geometric definability
Innovation

Methods, ideas, or system contributions that make the work stand out.

strategic classification
geometric definability
PAC learnability
VC dimension
first-order definability
🔎 Similar Papers
2024-06-17Annual Conference Computational Learning TheoryCitations: 1