🤖 AI Summary
Balcan et al. (2022) introduced Robustly Reliable Learning (RRL), but their framework degenerates for high-capacity hypothesis classes, and their generic algorithm requires repeated calls to an ERM oracle per test point—rendering inference computationally intractable at scale.
Method: We propose Regularized Robustly Reliable Learning (R²RL), the first formalization of RRL with explicit regularization to ensure robust reliability under expressive models. Our approach employs a sublinear-inference algorithm that dynamically maintains a regularized ERM solution, eliminating per-point retraining. We instantiate R²RL with linear classifiers and log-concave distribution modeling to enable efficient, theoretically grounded prediction.
Contribution/Results: R²RL achieves sublinear inference complexity—i.e., amortized cost per test point is asymptotically less than training time—while preserving rigorous reliability guarantees. This yields significant gains in scalability and practical applicability over prior RRL methods, especially for large-scale or streaming inference settings.
📝 Abstract
Instance-targeted data poisoning attacks, where an adversary corrupts a training set to induce errors on specific test points, have raised significant concerns. Balcan et al (2022) proposed an approach to addressing this challenge by defining a notion of robustly-reliable learners that provide per-instance guarantees of correctness under well-defined assumptions, even in the presence of data poisoning attacks. They then give a generic optimal (but computationally inefficient) robustly reliable learner as well as a computationally efficient algorithm for the case of linear separators over log-concave distributions. In this work, we address two challenges left open by Balcan et al (2022). The first is that the definition of robustly-reliable learners in Balcan et al (2022) becomes vacuous for highly-flexible hypothesis classes: if there are two classifiers h_0, h_1 in H both with zero error on the training set such that h_0(x)
eq h_1(x), then a robustly-reliable learner must abstain on x. We address this problem by defining a modified notion of regularized robustly-reliable learners that allows for nontrivial statements in this case. The second is that the generic algorithm of Balcan et al (2022) requires re-running an ERM oracle (essentially, retraining the classifier) on each test point x, which is generally impractical even if ERM can be implemented efficiently. To tackle this problem, we show that at least in certain interesting cases we can design algorithms that can produce their outputs in time sublinear in training time, by using techniques from dynamic algorithm design.