Is Memorization Helpful or Harmful? Prior Information Sets the Threshold

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This study investigates the impact of memorization—defined as achieving zero training error—on generalization in overparameterized linear models. Within a Bayesian framework, the authors integrate Fisher information, prior distribution properties, and generalization error theory to establish, for the first time, a precise noise threshold determined jointly by the Fisher information and variance of the prior. Below this threshold, interpolating solutions that perfectly fit the training data (i.e., memorize) improve generalization; above it, such memorization becomes detrimental. This work provides a quantitative link between the structure of the prior and generalization behavior, offering exact conditions and theoretical foundations for understanding the trade-off between memorization and generalization in overparameterized settings.

Technology Category

Application Category

📝 Abstract

We examine the connection between training error and generalization error for arbitrary estimating procedures, working in an overparameterized linear model under general priors in a Bayesian setup. We find determining factors inherent to the prior distribution $\pi$, giving explicit conditions under which optimal generalization necessitates that the training error be (i) near interpolating relative to the noise size (i.e., memorization is necessary), or (ii) close to the noise level (i.e., overfitting is harmful). Remarkably, these phenomena occur when the noise reaches thresholds determined by the Fisher information and the variance parameters of the prior $\pi$.

Problem

Research questions and friction points this paper is trying to address.

memorization

generalization

overparameterization

Bayesian prior

training error

Innovation

Methods, ideas, or system contributions that make the work stand out.

memorization

generalization

overparameterization