Leveraging Semantic Triples for Private Document Generation with Local Differential Privacy Guarantees

📅 2025-08-28

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

To address the severe utility degradation of text privatization under local differential privacy (LDP) with low privacy budgets (ε), this paper proposes DP-ST. First, semantic triplets are extracted to construct neighborhood-aware representations, shifting the differential privacy constraint from the raw word space to the semantic neighborhood space. Second, a neighborhood-constrained LDP mechanism is designed to guarantee rigorous privacy even at extremely low ε values (e.g., ε ≤ 1). Third, a large language model (LLM) is leveraged for semantic-consistency post-processing to generate coherent and usable private documents. Experiments demonstrate that DP-ST significantly outperforms state-of-the-art methods under stringent LDP guarantees (ε ≤ 1), achieving superior trade-offs between privacy preservation and semantic fidelity. Notably, DP-ST is the first framework to enable high-fidelity text generation under strict LDP constraints.

Technology Category

Application Category

📝 Abstract

Many works at the intersection of Differential Privacy (DP) in Natural Language Processing aim to protect privacy by transforming texts under DP guarantees. This can be performed in a variety of ways, from word perturbations to full document rewriting, and most often under local DP. Here, an input text must be made indistinguishable from any other potential text, within some bound governed by the privacy parameter $varepsilon$. Such a guarantee is quite demanding, and recent works show that privatizing texts under local DP can only be done reasonably under very high $varepsilon$ values. Addressing this challenge, we introduce DP-ST, which leverages semantic triples for neighborhood-aware private document generation under local DP guarantees. Through the evaluation of our method, we demonstrate the effectiveness of the divide-and-conquer paradigm, particularly when limiting the DP notion (and privacy guarantees) to that of a privatization neighborhood. When combined with LLM post-processing, our method allows for coherent text generation even at lower $varepsilon$ values, while still balancing privacy and utility. These findings highlight the importance of coherence in achieving balanced privatization outputs at reasonable $varepsilon$ levels.

Problem

Research questions and friction points this paper is trying to address.

Generate private documents under local differential privacy

Balance privacy and utility in text privatization

Achieve coherent text generation at low epsilon values

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging semantic triples for document generation

Using local differential privacy guarantees

Combining with LLM post-processing for coherence

🔎 Similar Papers

InferDPT: Privacy-Preserving Inference for Black-box Large Language Model