Systematic Generalization in Language Models Scales with Information Entropy

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Language models exhibit poor systematic generalization—e.g., semantic-equivalent reordering or compositional reuse of known concepts in novel contexts—and existing benchmarks lack interpretable, quantitative measures of task difficulty. Method: This paper introduces the first formalization of systematic generalization difficulty as the information entropy of component distributions in training data, establishing an entropy-based framework for quantifying sequence-to-sequence task difficulty. Contribution/Results: Through cross-architecture empirical analysis (including Transformers), we demonstrate that model performance monotonically degrades with increasing entropy, confirming information efficiency as a key driver of systematic generalization. We further show that strong generalization emerges in high-entropy settings without structural priors, while low-entropy tasks serve as robust and reliable benchmarks for evaluating systematic generalization. These findings provide both a theoretically grounded difficulty metric and actionable insights for model design and evaluation.

Technology Category

Application Category

📝 Abstract
Systematic generalization remains challenging for current language models, which are known to be both sensitive to semantically similar permutations of the input and to struggle with known concepts presented in novel contexts. Although benchmarks exist for assessing compositional behavior, it is unclear how to measure the difficulty of a systematic generalization problem. In this work, we show how one aspect of systematic generalization can be described by the entropy of the distribution of component parts in the training data. We formalize a framework for measuring entropy in a sequence-to-sequence task and find that the performance of popular model architectures scales with the entropy. Our work connects systematic generalization to information efficiency, and our results indicate that success at high entropy can be achieved even without built-in priors, and that success at low entropy can serve as a target for assessing progress towards robust systematic generalization.
Problem

Research questions and friction points this paper is trying to address.

Measuring difficulty of systematic generalization in language models
Linking systematic generalization to information entropy
Assessing model performance based on training data entropy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Measures generalization difficulty via training data entropy
Links model performance to information efficiency metrics
Achieves high entropy success without built-in priors
🔎 Similar Papers
No similar papers found.