🤖 AI Summary
This work investigates whether memorizing long-tailed data enhances model generalization—particularly on unseen rare feature combinations absent during training. We propose a “memory–composition synergy” theoretical framework, proving that memorizing tail instances significantly improves generalization even in zero-shot composition settings, and identifying model architecture as a critical determinant of compositional capability. Leveraging both theoretical analysis of linear models and empirical evaluation on controlled neural network benchmarks, we validate the mechanism’s efficacy. Results demonstrate that memorizing long-tailed examples systematically boosts prediction accuracy on unseen rare compositions, with this effect persisting robustly across nonlinear models. Our core contribution is the first dual-path (theoretical and experimental) characterization of how memory and composition jointly enable generalization—offering novel insights for long-tailed learning and generalization theory.
📝 Abstract
Deep learning has led researchers to rethink the relationship between memorization and generalization. In many settings, memorization does not hurt generalization due to implicit regularization and may help by memorizing long-tailed examples. In this paper, we consider the synergy between memorization and simple composition -- the ability to make correct prediction on a combination of long-tailed features. Theoretically, we show that for a linear setting, memorization together with composition can help the model make correct predictions on rare test examples that require a combination of long-tailed features, even if such combinations were never observed in the training data. Experiments on neural network architecture on simple data show that the theoretical insight extends beyond the linear setting, and we further observe that the composition capability of the model depends on its architecture.