Principled Out-of-Distribution Generalization via Simplicity

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the theoretical foundations underlying the strong out-of-distribution (OOD) generalization capability of foundation models, with a focus on compositional generalization in diffusion models. We propose a novel paradigm grounded in *simplicity* as an inductive bias and establish, for the first time, a rigorous OOD generalization theory based on simplicity measures—explicitly distinguishing between constant-gap and vanishing-gap settings. Theoretically, we prove that preferring simpler models ensures OOD generalization and derive the first tight sample complexity bound for learning the true simple model. Methodologically, our approach integrates simplicity quantification, regularized maximum likelihood estimation, and diffusion process analysis. Empirically and theoretically, the paradigm achieves optimal sample efficiency in both settings, substantially outperforming standard empirical risk minimization (ERM).

Technology Category

Application Category

📝 Abstract
Modern foundation models exhibit remarkable out-of-distribution (OOD) generalization, solving tasks far beyond the support of their training data. However, the theoretical principles underpinning this phenomenon remain elusive. This paper investigates this problem by examining the compositional generalization abilities of diffusion models in image generation. Our analysis reveals that while neural network architectures are expressive enough to represent a wide range of models -- including many with undesirable behavior on OOD inputs -- the true, generalizable model that aligns with human expectations typically corresponds to the simplest among those consistent with the training data. Motivated by this observation, we develop a theoretical framework for OOD generalization via simplicity, quantified using a predefined simplicity metric. We analyze two key regimes: (1) the constant-gap setting, where the true model is strictly simpler than all spurious alternatives by a fixed gap, and (2) the vanishing-gap setting, where the fixed gap is replaced by a smoothness condition ensuring that models close in simplicity to the true model yield similar predictions. For both regimes, we study the regularized maximum likelihood estimator and establish the first sharp sample complexity guarantees for learning the true, generalizable, simple model.
Problem

Research questions and friction points this paper is trying to address.

Understanding OOD generalization principles in foundation models
Analyzing compositional generalization in diffusion image models
Developing theory for simple models aligning with human expectations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Theoretical framework for OOD generalization via simplicity
Quantify simplicity using predefined metric
Sharp sample complexity guarantees established
🔎 Similar Papers
No similar papers found.
J
Jiawei Ge
Department of Operations Research and Financial Engineering, Princeton University
A
Amanda Wang
Department of Electrical and Computer Engineering, Princeton University
Shange Tang
Shange Tang
Princeton University
Machine learningStatistics
Chi Jin
Chi Jin
Assistant Professor, Princeton University
Machine LearningOptimization