Understanding and Mitigating Cross-lingual Privacy Leakage via Language-specific and Universal Privacy Neurons

📅 2025-06-01

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work identifies a novel privacy leakage risk in large language models (LLMs) under cross-lingual settings: personally identifiable information (PII) can be inadvertently disclosed via multilingual queries—even when the model is trained exclusively on monolingual (e.g., English) data—due to the coupling between mid-layer cross-lingual shared representations and language-specific transformations at the output layer. To address this, we propose the first fine-grained categorization of privacy-relevant neurons into “privacy-general neurons” and “language-specific privacy neurons.” Based on this, we design an intervention framework integrating neuron-level attribution analysis, cross-lingual representation disentanglement, and targeted neuron deactivation, enabling language-granular privacy control. Experiments demonstrate that our method reduces cross-lingual PII leakage by 23.3%–31.6% and significantly outperforms existing baselines on multilingual PII extraction tasks.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) trained on massive data capture rich information embedded in the training data. However, this also introduces the risk of privacy leakage, particularly involving personally identifiable information (PII). Although previous studies have shown that this risk can be mitigated through methods such as privacy neurons, they all assume that both the (sensitive) training data and user queries are in English. We show that they cannot defend against the privacy leakage in cross-lingual contexts: even if the training data is exclusively in one language, these (private) models may still reveal private information when queried in another language. In this work, we first investigate the information flow of cross-lingual privacy leakage to give a better understanding. We find that LLMs process private information in the middle layers, where representations are largely shared across languages. The risk of leakage peaks when converted to a language-specific space in later layers. Based on this, we identify privacy-universal neurons and language-specific privacy neurons. Privacy-universal neurons influence privacy leakage across all languages, while language-specific privacy neurons are only related to specific languages. By deactivating these neurons, the cross-lingual privacy leakage risk is reduced by 23.3%-31.6%.

Problem

Research questions and friction points this paper is trying to address.

Investigating cross-lingual privacy leakage in Large Language Models

Identifying universal and language-specific privacy neurons in LLMs

Mitigating privacy risks by deactivating specific neurons

Innovation

Methods, ideas, or system contributions that make the work stand out.

Identify universal and language-specific privacy neurons

Deactivate neurons to reduce privacy leakage risk

Mitigate cross-lingual privacy leakage in LLMs

🔎 Similar Papers

No similar papers found.