Predicting and analyzing memorization within fine-tuned Large Language Models

📅 2024-09-27

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Large language models (LLMs) tend to memorize training samples during fine-tuning for classification tasks, posing significant risks of sensitive information leakage; however, existing approaches lack the capability to *a priori* identify which samples are prone to memorization. Method: This paper introduces the first *a priori* memorization-risk detection framework based on Slice Mutual Information (SMI), enabling efficient identification of high-risk samples early in training. Our approach integrates information-theoretic analysis with dynamic modeling of training trajectories, offering theoretical guarantees of discriminative consistency and supporting low-overhead, real-time deployment. Contribution/Results: Experiments demonstrate substantial improvements in predictive accuracy for memorization risk. The framework provides a lightweight, proactive monitoring capability for LLM data privacy protection—shifting memory-aware research from *post hoc* explanation toward *preemptive* mitigation.

Technology Category

Application Category

📝 Abstract

Large Language Models have received significant attention due to their abilities to solve a wide range of complex tasks. However these models memorize a significant proportion of their training data, posing a serious threat when disclosed at inference time. To mitigate this unintended memorization, it is crucial to understand what elements are memorized and why. Most existing works provide a posteriori explanations, which has a limited interest in practice. To address this gap, we propose a new approach based on sliced mutual information to detect memorized samples a priori, in a classification setting. It is efficient from the early stages of training, and is readily adaptable to practical scenarios. Our method is supported by new theoretical results that we demonstrate, and requires a low computational budget. We obtain strong empirical results, paving the way for systematic inspection and protection of these vulnerable samples before memorization happens.

Problem

Research questions and friction points this paper is trying to address.

Predicting memorization in fine-tuned LLMs for classification

Understanding why and what data LLMs memorize during training

Detecting memorized samples early to protect vulnerable data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Detect memorized samples a priori in LLMs

Effective from early training stages

Low computational budget required

🔎 Similar Papers

Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon