Memorizing Long-tail Data Can Help Generalization Through Composition

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether memorizing long-tailed data enhances model generalization—particularly on unseen rare feature combinations absent during training. We propose a “memory–composition synergy” theoretical framework, proving that memorizing tail instances significantly improves generalization even in zero-shot composition settings, and identifying model architecture as a critical determinant of compositional capability. Leveraging both theoretical analysis of linear models and empirical evaluation on controlled neural network benchmarks, we validate the mechanism’s efficacy. Results demonstrate that memorizing long-tailed examples systematically boosts prediction accuracy on unseen rare compositions, with this effect persisting robustly across nonlinear models. Our core contribution is the first dual-path (theoretical and experimental) characterization of how memory and composition jointly enable generalization—offering novel insights for long-tailed learning and generalization theory.

Technology Category

Application Category

📝 Abstract
Deep learning has led researchers to rethink the relationship between memorization and generalization. In many settings, memorization does not hurt generalization due to implicit regularization and may help by memorizing long-tailed examples. In this paper, we consider the synergy between memorization and simple composition -- the ability to make correct prediction on a combination of long-tailed features. Theoretically, we show that for a linear setting, memorization together with composition can help the model make correct predictions on rare test examples that require a combination of long-tailed features, even if such combinations were never observed in the training data. Experiments on neural network architecture on simple data show that the theoretical insight extends beyond the linear setting, and we further observe that the composition capability of the model depends on its architecture.
Problem

Research questions and friction points this paper is trying to address.

Explores how memorizing long-tail data aids generalization through feature composition
Examines synergy between memorization and composition for rare test examples
Investigates how model architecture affects composition capability with long-tailed features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Memorizing long-tail data aids generalization through composition
Memorization enables prediction on unseen feature combinations
Model architecture influences composition capability for rare examples
M
Mo Zhou
University of Washington
Haoyang Ma
Haoyang Ma
HKUST
random program generatorcompiler testingbug localization
R
Rong Ge
Duke University