Predicting and analyzing memorization within fine-tuned Large Language Models

📅 2024-09-27
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) tend to memorize training samples during fine-tuning for classification tasks, posing significant risks of sensitive information leakage; however, existing approaches lack the capability to *a priori* identify which samples are prone to memorization. Method: This paper introduces the first *a priori* memorization-risk detection framework based on Slice Mutual Information (SMI), enabling efficient identification of high-risk samples early in training. Our approach integrates information-theoretic analysis with dynamic modeling of training trajectories, offering theoretical guarantees of discriminative consistency and supporting low-overhead, real-time deployment. Contribution/Results: Experiments demonstrate substantial improvements in predictive accuracy for memorization risk. The framework provides a lightweight, proactive monitoring capability for LLM data privacy protection—shifting memory-aware research from *post hoc* explanation toward *preemptive* mitigation.

Technology Category

Application Category

📝 Abstract
Large Language Models have received significant attention due to their abilities to solve a wide range of complex tasks. However these models memorize a significant proportion of their training data, posing a serious threat when disclosed at inference time. To mitigate this unintended memorization, it is crucial to understand what elements are memorized and why. Most existing works provide a posteriori explanations, which has a limited interest in practice. To address this gap, we propose a new approach based on sliced mutual information to detect memorized samples a priori, in a classification setting. It is efficient from the early stages of training, and is readily adaptable to practical scenarios. Our method is supported by new theoretical results that we demonstrate, and requires a low computational budget. We obtain strong empirical results, paving the way for systematic inspection and protection of these vulnerable samples before memorization happens.
Problem

Research questions and friction points this paper is trying to address.

Predicting memorization in fine-tuned LLMs for classification
Understanding why and what data LLMs memorize during training
Detecting memorized samples early to protect vulnerable data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Detect memorized samples a priori in LLMs
Effective from early training stages
Low computational budget required
J
J'er'emie Dentan
LIX (Ecole Polytechnique, IP Paris, CNRS)
Davide Buscaldi
Davide Buscaldi
Maître de conférences HDR, LIPN, Université Sorbonne Paris Nord
LLMsInformation RetrievalOntology LearningGeographic IRText Mining
A
A. Shabou
Credit Agricole SA
S
Sonia Vanier
LIX (Ecole Polytechnique, IP Paris, CNRS)