Mitigating Gradient Inversion Risks in Language Models via Token Obfuscation

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work addresses the vulnerability of language models to gradient inversion attacks in collaborative learning settings, which can lead to leakage of private training data. To mitigate this risk, the authors propose GHOST, a novel defense mechanism that introduces “shadow tokens”—semantically distinct yet embedding-proximal token replacements—at the token level. By leveraging a multi-criterion search combined with an internal output alignment strategy, GHOST selects optimal substitutions that preserve consistency in both embedding and gradient spaces while severing the intrinsic linkage among tokens, their embeddings, and resulting gradients. Extensive experiments demonstrate that GHOST effectively blocks attack pathways, reducing data recovery rates to below 1% across models ranging from BERT to Llama, while maintaining high utility: achieving a classification F1 score of 0.92 and a generation perplexity of 5.45, thereby offering a strong balance between privacy preservation and model performance.

Technology Category

Application Category

📝 Abstract

Training and fine-tuning large-scale language models largely benefit from collaborative learning, but the approach has been proven vulnerable to gradient inversion attacks (GIAs), which allow adversaries to reconstruct private training data from shared gradients. Existing defenses mainly employ gradient perturbation techniques, e.g., noise injection or gradient pruning, to disrupt GIAs' direct mapping from gradient space to token space. However, these methods often fall short due to the retention of semantics similarity across gradient, embedding, and token spaces. In this work, we propose a novel defense mechanism named GHOST (gradient shield with obfuscated tokens), a token-level obfuscation mechanism that neutralizes GIAs by decoupling the inherent connections across gradient, embedding, and token spaces. GHOST is built upon an important insight: due to the large scale of the token space, there exist semantically distinct yet embedding-proximate tokens that can serve as the shadow substitutes of the original tokens, which enables a semantic disconnection in the token space while preserving the connection in the embedding and gradient spaces. GHOST comprises a searching step, which identifies semantically distinct candidate tokens using a multi-criteria searching process, and a selection step, which selects optimal shadow tokens to ensure minimal disruption to features critical for training by preserving alignment with the internal outputs produced by original tokens. Evaluation across diverse model architectures (from BERT to Llama) and datasets demonstrates the remarkable effectiveness of GHOST in protecting privacy (as low as 1% in recovery rate) and preserving utility (up to 0.92 in classification F1 and 5.45 in perplexity), in both classification and generation tasks against state-of-the-art GIAs and adaptive attack scenarios.

Problem

Research questions and friction points this paper is trying to address.

gradient inversion attacks

privacy leakage

language models

collaborative learning

token space

Innovation

Methods, ideas, or system contributions that make the work stand out.

gradient inversion attack

token obfuscation

privacy-preserving machine learning