🤖 AI Summary
This paper investigates provably robust learning of neural networks under distribution shift, focusing on achieving low test error or reliable prediction rejection when the test distribution is unknown. Methodologically, it introduces TDS (Testable Distribution Shift) learning—an efficient framework for non-convex regression tasks that imposes no assumptions on the test distribution. Under mild conditions—sub-exponential tail decay and hypercontraction of training data—it delivers the first fully polynomial-time TDS algorithm for sigmoidal single-hidden-layer networks. Key technical innovations include: (i) construction of a coupled kernel matrix over training and test samples; (ii) a data-dependent feature mapping; and (iii) a unified analytical toolkit integrating kernel methods, Lipschitz activation analysis, and sub-exponential distribution theory. Theoretical guarantees provide *verifiable*, instance-dependent upper bounds on test error. This work significantly advances provably robust learning for neural networks under arbitrary distribution shifts.
📝 Abstract
We give the first provably efficient algorithms for learning neural networks with distribution shift. We work in the Testable Learning with Distribution Shift framework (TDS learning) of Klivans et al. (2024), where the learner receives labeled examples from a training distribution and unlabeled examples from a test distribution and must either output a hypothesis with low test error or reject if distribution shift is detected. No assumptions are made on the test distribution. All prior work in TDS learning focuses on classification, while here we must handle the setting of nonconvex regression. Our results apply to real-valued networks with arbitrary Lipschitz activations and work whenever the training distribution has strictly sub-exponential tails. For training distributions that are bounded and hypercontractive, we give a fully polynomial-time algorithm for TDS learning one hidden-layer networks with sigmoid activations. We achieve this by importing classical kernel methods into the TDS framework using data-dependent feature maps and a type of kernel matrix that couples samples from both train and test distributions.