Linear-Core Surrogates: Smooth Loss Functions with Linear Rates for Classification and Structured Prediction

πŸ“… 2026-04-30
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

223K/year
πŸ€– AI Summary
This work addresses the longstanding trade-off in classification and structured prediction between optimization efficiency and statistical consistency: smooth losses are easy to optimize but converge slowly, while piecewise-linear losses offer linear consistency yet lack differentiability. The paper proposes Linear-Core (LC) surrogate lossesβ€”the first globally differentiable, convex loss functions that strictly satisfy linear H-consistency. By seamlessly blending a linear core with smooth tails, LC enables unbiased stochastic gradient estimation while maintaining everywhere differentiability, thereby circumventing the quadratic-complexity bottleneck of exact inference in structured prediction. Experiments demonstrate that LC achieves a 23Γ— speedup over Structured SVM on large-vocabulary sequence labeling tasks and improves accuracy by 2.6% over cross-entropy on noisy CIFAR-10.
πŸ“ Abstract
The choice of loss function in classification involves a fundamental trade-off: smooth losses (like Cross-Entropy) enable fast optimization rates but yield slow square-root consistency bounds, while piecewise-linear losses (like Hinge) offer fast linear consistency rates but suffer from non-differentiability. We propose Linear-Core (LC) Surrogates, a new family of convex loss functions that resolve this tension by stitching a linear core to a smooth tail. We prove that these surrogates are differentiable everywhere while retaining strict linear $H$-consistency bounds, effectively combining the optimization benefits of smoothness with the statistical efficiency of margin-based losses. In the structured prediction setting, we show that this smoothness unlocks a massive computational and energy advantage: it allows for an unbiased stochastic gradient estimator that bypasses the quadratic complexity $O(|\mathscr{Y}|^2)$ of exact inference (e.g., Viterbi). Empirically, our method achieves a 23$\times$ speedup over Structured SVMs on large-vocabulary sequence tagging tasks and demonstrates superior robustness to instance-dependent label noise, outperforming Cross-Entropy by 2.6% on corrupted CIFAR-10.
Problem

Research questions and friction points this paper is trying to address.

loss function
optimization
statistical consistency
structured prediction
classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear-Core Surrogates
smooth loss functions
linear consistency
structured prediction
stochastic gradient estimation