🤖 AI Summary
This work addresses the problem of out-of-distribution (OOD) generalization, where models must maintain performance despite shifts between training and test distributions. Building on the assumption of feature sparsity and the principle of Occam’s razor, the paper proposes that sparse classifiers relying on only a few critical features can achieve effective OOD generalization when relevant features overlap across domains. By extending classical sample complexity theory to the OOD setting, the authors introduce a subspace junta model grounded in low-dimensional linear feature subspaces and derive corresponding probabilistic generalization bounds. The theoretical analysis demonstrates that the proposed sparsity assumption ensures robust OOD generalization and successfully extends the classic sample complexity bound of Blumer et al. to the distribution-shift regime.
📝 Abstract
Explaining out-of-distribution generalization has been a central problem in epistemology since Goodman's"grue"puzzle in 1946. Today it's a central problem in machine learning, including AI alignment. Here we propose a principled account of OOD generalization with three main ingredients. First, the world is always presented to experience not as an amorphous mass, but via distinguished features (for example, visual and auditory channels). Second, Occam's Razor favors hypotheses that are"sparse,"meaning that they depend on as few features as possible. Third, sparse hypotheses will generalize from a training to a test distribution, provided the two distributions sufficiently overlap on their restrictions to the features that are either actually relevant or hypothesized to be. The two distributions could diverge arbitrarily on other features. We prove a simple theorem that formalizes the above intuitions, generalizing the classic sample complexity bound of Blumer et al. to an OOD context. We then generalize sparse classifiers to subspace juntas, where the ground truth classifier depends solely on a low-dimensional linear subspace of the features.