🤖 AI Summary
This paper addresses the theoretical foundations of zero-shot generalization: how pre-trained foundation models achieve reliable inference without downstream labeled data. Method: We propose the first systematic theoretical framework that explicitly defines the implicitly learned target variable in zero-shot settings and characterizes the conditional independence structures required for generalization. Our analysis integrates self-supervised and multimodal contrastive learning representations with statistical learning theory. Contribution/Results: We provide the first theoretical characterization of the intrinsic mechanism underlying zero-shot transfer, establishing interpretable generalization bounds. We quantify the relationship between the learnability of the target task and the conditional independence properties of learned representations. Furthermore, we derive verifiable theoretical principles to guide the design of foundation models with enhanced generalization capability—particularly under zero-shot deployment.
📝 Abstract
A modern paradigm for generalization in machine learning and AI consists of pre-training a task-agnostic foundation model, generally obtained using self-supervised and multimodal contrastive learning. The resulting representations can be used for prediction on a downstream task for which no labeled data is available. We present a theoretical framework to better understand this approach, called zero-shot prediction. We identify the target quantities that zero-shot prediction aims to learn, or learns in passing, and the key conditional independence relationships that enable its generalization ability.