Adaptive Token-Weighted Differential Privacy for LLMs: Not All Tokens Require Equal Protection

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Large language models (LLMs) are prone to memorizing sensitive or private data, posing severe privacy risks. Existing differentially private stochastic gradient descent (DP-SGD) applies uniform noise to all tokens, incurring high computational overhead and substantial accuracy degradation. To address this, we propose Adaptive Token-weighted Differential Privacy (ATDP), a method that dynamically allocates gradient noise intensity based on token-level sensitivity—applying stronger perturbation to sensitive tokens and milder perturbation to non-sensitive ones. ATDP integrates red-teaming–guided fine-tuning preprocessing with targeted noise injection, enabling plug-and-play privacy enhancement within the DP-SGD framework. Experiments demonstrate that ATDP achieves comparable or stronger privacy guarantees (measured by ε-DP) while reducing DP fine-tuning training time by approximately 90% and incurring negligible accuracy loss. Overall, ATDP significantly improves the triadic trade-off among privacy preservation, model utility, and training efficiency.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) frequently memorize sensitive or personal information, raising significant privacy concerns. Existing variants of differential privacy stochastic gradient descent (DPSGD) inject uniform noise into every gradient step, significantly extending training time and reducing model accuracy. We propose that concentrating noise primarily on gradients associated with sensitive tokens can substantially decrease DP training time, strengthen the protection of sensitive information, and simultaneously preserve the model's performance on non-sensitive data. We operationalize this insight through Adaptive Token-Weighted Differential Privacy (ATDP), a modification of vanilla DP-SGD that adaptively assigns different gradient weights to sensitive and non-sensitive tokens. By employing a larger noise scale at the early stage of training, ATDP rapidly disrupts memorization of sensitive content. As a result, ATDP only requires a few additional epochs of lightweight post-processing following standard fine-tuning, injecting targeted noise primarily on parameters corresponding to sensitive tokens, thus minimally affecting the model's general capabilities. ATDP can be seamlessly integrated into any existing DP-based fine-tuning pipeline or directly applied to non-private models as a fast privacy-enhancing measure. Additionally, combined with an initial redacted fine-tuning phase, ATDP forms a streamlined DP pipeline that achieves comparable canary protection to state-of-the-art DP-SGD methods, significantly reduces the computational overhead of DP fine-tuning, shortening training time by approximately 90 percent, while achieving comparable or superior privacy protection and minimal accuracy degradation.

Problem

Research questions and friction points this paper is trying to address.

LLMs memorize sensitive data requiring privacy protection

Uniform DP noise slows training and reduces model accuracy

Targeted noise on sensitive tokens improves privacy and efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive token-weighted noise for sensitive gradients

Early training noise injection disrupts memorization rapidly

Lightweight post-processing minimizes computational overhead significantly

🔎 Similar Papers

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions