Taking a Big Step: Large Learning Rates in Denoising Score Matching Prevent Memorization

📅 2025-02-05

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses overfitting in diffusion models—specifically, memorization (i.e., mechanical replication of training data) arising from denoising score matching. We identify large learning rates as an implicit regularizer. Through one-dimensional theoretical analysis and modeling via a two-layer neural network, coupled with stochastic gradient descent dynamics, we formally prove—for the first time—that under low-noise conditions, large learning rates prevent convergence to highly irregular empirical optimal score solutions, thereby inherently suppressing memorization. Theoretical analysis further reveals that this stems from the failure of stable convergence under non-zero excess risk. Empirical validation confirms that this phenomenon persists beyond idealized settings: large learning rates effectively mitigate memorization and improve generalization in high-dimensional image generation tasks.

Technology Category

Application Category

📝 Abstract

Denoising score matching plays a pivotal role in the performance of diffusion-based generative models. However, the empirical optimal score--the exact solution to the denoising score matching--leads to memorization, where generated samples replicate the training data. Yet, in practice, only a moderate degree of memorization is observed, even without explicit regularization. In this paper, we investigate this phenomenon by uncovering an implicit regularization mechanism driven by large learning rates. Specifically, we show that in the small-noise regime, the empirical optimal score exhibits high irregularity. We then prove that, when trained by stochastic gradient descent with a large enough learning rate, neural networks cannot stably converge to a local minimum with arbitrarily small excess risk. Consequently, the learned score cannot be arbitrarily close to the empirical optimal score, thereby mitigating memorization. To make the analysis tractable, we consider one-dimensional data and two-layer neural networks. Experiments validate the crucial role of the learning rate in preventing memorization, even beyond the one-dimensional setting.

Problem

Research questions and friction points this paper is trying to address.

Prevent memorization in denoising score matching

Large learning rates reduce sample replication

Implicit regularization mitigates training data replication

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large learning rates

Prevent memorization

Implicit regularization mechanism

🔎 Similar Papers

No similar papers found.