Learning Neural Networks with Distribution Shift: Efficiently Certifiable Guarantees

📅 2025-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates provably robust learning of neural networks under distribution shift, focusing on achieving low test error or reliable prediction rejection when the test distribution is unknown. Methodologically, it introduces TDS (Testable Distribution Shift) learning—an efficient framework for non-convex regression tasks that imposes no assumptions on the test distribution. Under mild conditions—sub-exponential tail decay and hypercontraction of training data—it delivers the first fully polynomial-time TDS algorithm for sigmoidal single-hidden-layer networks. Key technical innovations include: (i) construction of a coupled kernel matrix over training and test samples; (ii) a data-dependent feature mapping; and (iii) a unified analytical toolkit integrating kernel methods, Lipschitz activation analysis, and sub-exponential distribution theory. Theoretical guarantees provide *verifiable*, instance-dependent upper bounds on test error. This work significantly advances provably robust learning for neural networks under arbitrary distribution shifts.

Technology Category

Application Category

📝 Abstract
We give the first provably efficient algorithms for learning neural networks with distribution shift. We work in the Testable Learning with Distribution Shift framework (TDS learning) of Klivans et al. (2024), where the learner receives labeled examples from a training distribution and unlabeled examples from a test distribution and must either output a hypothesis with low test error or reject if distribution shift is detected. No assumptions are made on the test distribution. All prior work in TDS learning focuses on classification, while here we must handle the setting of nonconvex regression. Our results apply to real-valued networks with arbitrary Lipschitz activations and work whenever the training distribution has strictly sub-exponential tails. For training distributions that are bounded and hypercontractive, we give a fully polynomial-time algorithm for TDS learning one hidden-layer networks with sigmoid activations. We achieve this by importing classical kernel methods into the TDS framework using data-dependent feature maps and a type of kernel matrix that couples samples from both train and test distributions.
Problem

Research questions and friction points this paper is trying to address.

Efficient algorithms for neural networks
Learning with distribution shift
Nonconvex regression in TDS framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient algorithms for neural networks
Data-dependent feature maps integration
Kernel matrix coupling train-test samples
Gautam Chandrasekaran
Gautam Chandrasekaran
PhD Student, University of Texas at Austin
Theoretical Computer ScienceMachine Learning
A
Adam R. Klivans
UT Austin
L
Lin Lin Lee
UT Austin
K
Konstantinos Stavropoulos
UT Austin