🤖 AI Summary
In the fully transductive setting, existing theory struggles to provide effective generalization guarantees for leave-one-out (LOO) prediction error across arbitrary hypothesis classes. This work proposes the Median-Level Set Aggregation (MLSA) method, which constructs LOO predictions by aggregating over level sets centered around empirical risk minimizers and introduces a local level set growth condition to analyze its generalization performance. The paper establishes, for the first time, multiplicative oracle inequalities for LOO error that hold for general hypothesis classes, with applications to classification, density estimation, and logistic regression. Under 0–1 loss, it achieves an $O(d \log n)$ bound for VC classes, $O(\log|H|)$ for finite hypothesis classes, and $O(\log|P|)$ for finite density classes; logistic regression also attains an $O(d \log n)$ generalization bound.
📝 Abstract
Leave-one-out (LOO) prediction provides a principled, data-dependent measure of generalization, yet guarantees in fully transductive settings remain poorly understood beyond specialized models. We introduce Median of Level-Set Aggregation (MLSA), a general aggregation procedure based on empirical-risk level sets around the ERM. For arbitrary fixed datasets and losses satisfying a mild monotonicity condition, we establish a multiplicative oracle inequality for the LOO error of the form \[ LOO_S(\hat{h}) \;\le\; C \cdot \frac{1}{n} \min_{h\in H} L_S(h) \;+\; \frac{Comp(S,H,\ell)}{n}, \qquad C>1. \]
The analysis is based on a local level-set growth condition controlling how the set of near-optimal empirical-risk minimizers expands as the tolerance increases. We verify this condition in several canonical settings. For classification with VC classes under the 0-1 loss, the resulting complexity scales as $O(d \log n)$, where $d$ is the VC dimension. For finite hypothesis and density classes under bounded or log loss, it scales as $O(\log |H|)$ and $O(\log |P|)$, respectively. For logistic regression with bounded covariates and parameters, a volumetric argument based on the empirical covariance matrix yields complexity scaling as $O(d \log n)$ up to problem-dependent factors.