🤖 AI Summary
Deep neural networks often fail to achieve compositional generalization—i.e., generalizing to novel combinations of known primitives—despite empirical success in certain tasks.
Method: We derive the first necessary and sufficient conditions for compositional generalization: (i) the model’s computational graph must structurally match the true compositional hierarchy, and (ii) each primitive must encode only the minimal task-relevant information during training. We establish this via rigorous theoretical analysis, formal mathematical proof, and constructive minimal examples, unifying the roles of architecture design, regularization, and data distribution.
Contribution/Results: Our conditions are formally verifiable *a priori*, enabling predictive assessment of generalization potential before training. They provide a principled foundation for interpretable AI and structured model design, bridging theoretical guarantees with practical architectural constraints. The framework explains why many standard architectures fail—and how targeted structural alignment and information minimality can restore compositional generalization.
📝 Abstract
Compositional generalization is a crucial property in artificial intelligence, enabling models to handle novel combinations of known components. While most deep learning models lack this capability, certain models succeed in specific tasks, suggesting the existence of governing conditions. This paper derives a necessary and sufficient condition for compositional generalization in neural networks. Conceptually, it requires that (i) the computational graph matches the true compositional structure, and (ii) components encode just enough information in training. The condition is supported by mathematical proofs. This criterion combines aspects of architecture design, regularization, and training data properties. A carefully designed minimal example illustrates an intuitive understanding of the condition. We also discuss the potential of the condition for assessing compositional generalization before training. This work is a fundamental theoretical study of compositional generalization in neural networks.