🤖 AI Summary
This paper addresses the estimation error arising in machine learning when non-differentiable or hard-to-optimize target losses (e.g., 0–1 error) are replaced by surrogate losses (e.g., cross-entropy). We establish a consistency theory tailored to a given hypothesis class $H$. First, we introduce $H$-consistency bounds—strictly stronger than Bayes-consistency and $H$-calibration—and derive tight distribution-dependent and distribution-independent bounds for binary and multiclass settings. Second, we provide the first tight $H$-consistency bounds for max-, sum-, and comp-sum-type surrogates—including cross-entropy and MAE—as well as for adversarially robust learning. Third, we characterize the fundamental unattainability of perfect calibration and reveal a square-root growth law for the estimation error. Finally, we propose a smoothed adversarial surrogate loss and empirically validate its effectiveness. Our results deliver distribution-adaptive theoretical guarantees for non-Bayesian, non-differentiable learning tasks.
📝 Abstract
In machine learning, the loss functions optimized during training often differ from the target loss that defines task performance due to computational intractability or lack of differentiability. We present an in-depth study of the target loss estimation error relative to the surrogate loss estimation error. Our analysis leads to $H$-consistency bounds, which are guarantees accounting for the hypothesis set $H$. These bounds offer stronger guarantees than Bayes-consistency or $H$-calibration and are more informative than excess error bounds.
We begin with binary classification, establishing tight distribution-dependent and -independent bounds. We provide explicit bounds for convex surrogates (including linear models and neural networks) and analyze the adversarial setting for surrogates like $ρ$-margin and sigmoid loss. Extending to multi-class classification, we present the first $H$-consistency bounds for max, sum, and constrained losses, covering both non-adversarial and adversarial scenarios. We demonstrate that in some cases, non-trivial $H$-consistency bounds are unattainable. We also investigate comp-sum losses (e.g., cross-entropy, MAE), deriving their first $H$-consistency bounds and introducing smooth adversarial variants that yield robust learning algorithms.
We develop a comprehensive framework for deriving these bounds across various surrogates, introducing new characterizations for constrained and comp-sum losses. Finally, we examine the growth rates of $H$-consistency bounds, establishing a universal square-root growth rate for smooth surrogates in binary and multi-class tasks, and analyze minimizability gaps to guide surrogate selection.