The Power of Iterative Filtering for Supervised Learning with (Heavy) Contamination

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work investigates efficient learnability under strong adversarial data corruptions in supervised learning—including bounded contamination, majority-level heavy corruption, severe additive noise, and testable tolerant learning. We propose an iterative polynomial filtering algorithm, establishing for the first time that low-degree polynomial approximation suffices for robust learning against adversarial contamination. Our method achieves near-optimal learning for function classes under heavy corruption and provides the first tolerant testable learning algorithm for halfspaces over log-concave distributions. Technically, we integrate hypercontractivity-based distributional analysis, sandwich-style approximator construction, moment matching, and the statistical query framework. The algorithm enables efficient learning under both nasty noise and severe additive corruption, resolving the long-standing open problem of tolerant learning for halfspaces over log-concave distributions. Moreover, it unifies theoretical boundaries between adversarial contamination learning and distribution shift.

Technology Category

Application Category

📝 Abstract

Inspired by recent work on learning with distribution shift, we give a general outlier removal algorithm called iterative polynomial filtering and show a number of striking applications for supervised learning with contamination: (1) We show that any function class that can be approximated by low-degree polynomials with respect to a hypercontractive distribution can be efficiently learned under bounded contamination (also known as nasty noise). This is a surprising resolution to a longstanding gap between the complexity of agnostic learning and learning with contamination, as it was widely believed that low-degree approximators only implied tolerance to label noise. (2) For any function class that admits the (stronger) notion of sandwiching approximators, we obtain near-optimal learning guarantees even with respect to heavy additive contamination, where far more than $1/2$ of the training set may be added adversarially. Prior related work held only for regression and in a list-decodable setting. (3) We obtain the first efficient algorithms for tolerant testable learning of functions of halfspaces with respect to any fixed log-concave distribution. Even the non-tolerant case for a single halfspace in this setting had remained open. These results significantly advance our understanding of efficient supervised learning under contamination, a setting that has been much less studied than its unsupervised counterpart.

Problem

Research questions and friction points this paper is trying to address.

Efficient learning under bounded contamination using low-degree polynomial approximators

Near-optimal learning guarantees for heavy additive contamination via sandwiching approximators

Tolerant testable learning of halfspace functions under log-concave distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative polynomial filtering for outlier removal

Low-degree polynomial approximation under contamination

Sandwiching approximators for heavy additive contamination

🔎 Similar Papers

No similar papers found.