Data Distributional Properties As Inductive Bias for Systematic Generalization

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep neural networks exhibit poor systematic generalization. This paper systematically demonstrates— for the first time—that intrinsic distributional properties of training data—namely diversity, burstiness, and latent interventions—serve as strong inductive biases that substantially enhance systematic generalization in multimodal language models. Leveraging latent-variable intervention experiments, normalized mutual information (NMI) analysis, and geometric measurements of neural representations (e.g., vector parallelism), we identify NMI as a critical predictor of out-of-distribution (OOD) generalization: low NMI induces parallel structures in representation space, enabling analogical reasoning. Among the factors, data diversity contributes most significantly, yielding up to an 89% absolute accuracy gain. Our findings uncover how the internal statistical structure of data drives generalization beyond the training distribution, establishing a novel paradigm for controllable generalization modeling grounded in data-driven inductive biases.

Technology Category

Application Category

📝 Abstract
Deep neural networks (DNNs) struggle at systematic generalization (SG). Several studies have evaluated the possibility to promote SG through the proposal of novel architectures, loss functions or training methodologies. Few studies, however, have focused on the role of training data properties in promoting SG. In this work, we investigate the impact of certain data distributional properties, as inductive biases for the SG ability of a multi-modal language model. To this end, we study three different properties. First, data diversity, instantiated as an increase in the possible values a latent property in the training distribution may take. Second, burstiness, where we probabilistically restrict the number of possible values of latent factors on particular inputs during training. Third, latent intervention, where a particular latent factor is altered randomly during training. We find that all three factors significantly enhance SG, with diversity contributing an 89% absolute increase in accuracy in the most affected property. Through a series of experiments, we test various hypotheses to understand why these properties promote SG. Finally, we find that Normalized Mutual Information (NMI) between latent attributes in the training distribution is strongly predictive of out-of-distribution generalization. We find that a mechanism by which lower NMI induces SG is in the geometry of representations. In particular, we find that NMI induces more parallelism in neural representations (i.e., input features coded in parallel neural vectors) of the model, a property related to the capacity of reasoning by analogy.
Problem

Research questions and friction points this paper is trying to address.

Impact of data distributional properties on systematic generalization
Role of data diversity, burstiness, and latent intervention in enhancing SG
Normalized Mutual Information predicts out-of-distribution generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced data diversity boosts systematic generalization.
Burstiness in training improves model generalization.
Latent intervention increases neural representation parallelism.
🔎 Similar Papers