Systematic Generalization in Language Models Scales with Information Entropy

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Language models exhibit poor systematic generalization—e.g., semantic-equivalent reordering or compositional reuse of known concepts in novel contexts—and existing benchmarks lack interpretable, quantitative measures of task difficulty. Method: This paper introduces the first formalization of systematic generalization difficulty as the information entropy of component distributions in training data, establishing an entropy-based framework for quantifying sequence-to-sequence task difficulty. Contribution/Results: Through cross-architecture empirical analysis (including Transformers), we demonstrate that model performance monotonically degrades with increasing entropy, confirming information efficiency as a key driver of systematic generalization. We further show that strong generalization emerges in high-entropy settings without structural priors, while low-entropy tasks serve as robust and reliable benchmarks for evaluating systematic generalization. These findings provide both a theoretically grounded difficulty metric and actionable insights for model design and evaluation.

Technology Category

Application Category

📝 Abstract

Systematic generalization remains challenging for current language models, which are known to be both sensitive to semantically similar permutations of the input and to struggle with known concepts presented in novel contexts. Although benchmarks exist for assessing compositional behavior, it is unclear how to measure the difficulty of a systematic generalization problem. In this work, we show how one aspect of systematic generalization can be described by the entropy of the distribution of component parts in the training data. We formalize a framework for measuring entropy in a sequence-to-sequence task and find that the performance of popular model architectures scales with the entropy. Our work connects systematic generalization to information efficiency, and our results indicate that success at high entropy can be achieved even without built-in priors, and that success at low entropy can serve as a target for assessing progress towards robust systematic generalization.

Problem

Research questions and friction points this paper is trying to address.

Measuring difficulty of systematic generalization in language models

Linking systematic generalization to information entropy

Assessing model performance based on training data entropy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Measures generalization difficulty via training data entropy

Links model performance to information efficiency metrics

Achieves high entropy success without built-in priors

🔎 Similar Papers

No similar papers found.