🤖 AI Summary
This work addresses the fundamental distinction between unintentional memorization and generalization in language models. We propose the first formal theoretical framework that employs counterfactual control of generalization to precisely quantify unintentional memorization capacity, and design a membership inference–based evaluation paradigm validated through large-scale parameter sweeps (500K–1.5B) and empirical Transformer experiments. Key contributions include: (1) identification of a saturation point in unintentional memorization, which triggers the “grokking” phenomenon; (2) estimation of memorization capacity in GPT-style models at approximately 3.6 bits per parameter; and (3) derivation of novel scaling laws linking memorization–generalization trade-offs, training data scale, and membership inference risk. These results establish a rigorous theoretical foundation and empirical benchmark for modeling model memorization, assessing privacy risks, and understanding training dynamics.
📝 Abstract
We propose a new method for estimating how much a model ``knows'' about a datapoint and use it to measure the capacity of modern language models. Prior studies of language model memorization have struggled to disentangle memorization from generalization. We formally separate memorization into two components: extit{unintended memorization}, the information a model contains about a specific dataset, and extit{generalization}, the information a model contains about the true data-generation process. When we completely eliminate generalization, we can compute the total memorization, which provides an estimate of model capacity: our measurements estimate that GPT-style models have a capacity of approximately 3.6 bits per parameter. We train language models on datasets of increasing size and observe that models memorize until their capacity fills, at which point ``grokking'' begins, and unintended memorization decreases as models begin to generalize. We train hundreds of transformer language models ranging from $500K$ to $1.5B$ parameters and produce a series of scaling laws relating model capacity and data size to membership inference.