🤖 AI Summary
This work identifies a novel privacy leakage risk in large language models (LLMs) under cross-lingual settings: personally identifiable information (PII) can be inadvertently disclosed via multilingual queries—even when the model is trained exclusively on monolingual (e.g., English) data—due to the coupling between mid-layer cross-lingual shared representations and language-specific transformations at the output layer. To address this, we propose the first fine-grained categorization of privacy-relevant neurons into “privacy-general neurons” and “language-specific privacy neurons.” Based on this, we design an intervention framework integrating neuron-level attribution analysis, cross-lingual representation disentanglement, and targeted neuron deactivation, enabling language-granular privacy control. Experiments demonstrate that our method reduces cross-lingual PII leakage by 23.3%–31.6% and significantly outperforms existing baselines on multilingual PII extraction tasks.
📝 Abstract
Large Language Models (LLMs) trained on massive data capture rich information embedded in the training data. However, this also introduces the risk of privacy leakage, particularly involving personally identifiable information (PII). Although previous studies have shown that this risk can be mitigated through methods such as privacy neurons, they all assume that both the (sensitive) training data and user queries are in English. We show that they cannot defend against the privacy leakage in cross-lingual contexts: even if the training data is exclusively in one language, these (private) models may still reveal private information when queried in another language. In this work, we first investigate the information flow of cross-lingual privacy leakage to give a better understanding. We find that LLMs process private information in the middle layers, where representations are largely shared across languages. The risk of leakage peaks when converted to a language-specific space in later layers. Based on this, we identify privacy-universal neurons and language-specific privacy neurons. Privacy-universal neurons influence privacy leakage across all languages, while language-specific privacy neurons are only related to specific languages. By deactivating these neurons, the cross-lingual privacy leakage risk is reduced by 23.3%-31.6%.