🤖 AI Summary
Existing super-predictors suffer from high sample complexity, long training times, and uninterpretable predictions, making it infeasible to achieve ε-competitive prediction against bounded linear predictors under arbitrary monotone Lipschitz losses. This work introduces efficient *omnipredictors*: (1) the first practical omnipredictor for single-metric models; (2) an improved Isotron algorithm with sharp analysis under non-realizable settings; and (3) a multi-metric model with ≈ε⁻² prediction heads that approximates proper omniprediction. The theoretical framework leverages the class of monotone Lipschitz losses within the agnostic learning setting. Our approach reduces sample complexity to ≈ε⁻⁴ (and to ≈ε⁻² under bi-Lipschitz losses), a substantial improvement over prior ε⁻¹⁰ bounds; achieves nearly linear runtime; and outputs compact, interpretable multi-index predictors—enabling both statistical efficiency and practical deployability.
📝 Abstract
Recent work on supervised learning [GKR+22] defined the notion of omnipredictors, i.e., predictor functions $p$ over features that are simultaneously competitive for minimizing a family of loss functions $mathcal{L}$ against a comparator class $mathcal{C}$. Omniprediction requires approximating the Bayes-optimal predictor beyond the loss minimization paradigm, and has generated significant interest in the learning theory community. However, even for basic settings such as agnostically learning single-index models (SIMs), existing omnipredictor constructions require impractically-large sample complexities and runtimes, and output complex, highly-improper hypotheses. Our main contribution is a new, simple construction of omnipredictors for SIMs. We give a learner outputting an omnipredictor that is $varepsilon$-competitive on any matching loss induced by a monotone, Lipschitz link function, when the comparator class is bounded linear predictors. Our algorithm requires $approx varepsilon^{-4}$ samples and runs in nearly-linear time, and its sample complexity improves to $approx varepsilon^{-2}$ if link functions are bi-Lipschitz. This significantly improves upon the only prior known construction, due to [HJKRR18, GHK+23], which used $gtrsim varepsilon^{-10}$ samples. We achieve our construction via a new, sharp analysis of the classical Isotron algorithm [KS09, KKKS11] in the challenging agnostic learning setting, of potential independent interest. Previously, Isotron was known to properly learn SIMs in the realizable setting, as well as constant-factor competitive hypotheses under the squared loss [ZWDD24]. As they are based on Isotron, our omnipredictors are multi-index models with $approx varepsilon^{-2}$ prediction heads, bringing us closer to the tantalizing goal of proper omniprediction for general loss families and comparators.