Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy

📅 2026-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation that the nominal local differential privacy (LDP) parameter ε fails to accurately reflect the actual privacy guarantees of text rewriting mechanisms, thereby hindering meaningful comparisons of privacy–utility trade-offs across different methods. To bridge this gap, the paper proposes TeDA, a novel framework that introduces, for the first time, an empirical cross-mechanism privacy calibration approach. TeDA jointly evaluates the distinguishability of rewritten outputs in both textual surface forms and embedding spaces through hypothesis testing, effectively linking the theoretical ε with practical indistinguishability. Experimental results demonstrate that mechanisms sharing the same nominal ε can exhibit substantially different empirical privacy losses, confirming TeDA’s effectiveness in enhancing the practicality and comparability of privacy assessments for LDP-based text publishing.

Technology Category

Application Category

📝 Abstract
The growing use of large language models has increased interest in sharing textual data in a privacy-preserving manner. One prominent line of work addresses this challenge through text rewriting under Local Differential Privacy (LDP), where input texts are locally obfuscated before release with formal privacy guarantees. These guarantees are typically expressed by a parameter $\varepsilon$ that upper bounds the worst-case privacy loss. However, nominal $\varepsilon$ values are often difficult to interpret and compare across mechanisms. In this work, we investigate how to empirically calibrate across text rewriting mechanisms under LDP. We propose TeDA, which formulates calibration via a hypothesis-testing framework that instantiates text distinguishability audits in both surface and embedding spaces, enabling empirical assessment of indistinguishability from privatized texts. Applying this calibration to several representative mechanisms, we demonstrate that similar nominal $\varepsilon$ bounds can imply very different levels of distinguishability. Empirical calibration thus provides a more comparable footing for evaluating privacy-utility trade-offs, as well as a practical tool for mechanism comparison and analysis in real-world LDP text rewriting deployments.
Problem

Research questions and friction points this paper is trying to address.

Local Differential Privacy
Text Rewriting
Privacy Calibration
Empirical Privacy Loss
Privacy Parameter Interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Local Differential Privacy
Text Rewriting
Empirical Calibration
Privacy Loss
Distinguishability Audit
🔎 Similar Papers
No similar papers found.
W
Weijun Li
School of Computing, FSE, Macquarie University, Sydney, Australia
A
Arnaud Grivet Sébert
School of Computing, FSE, Macquarie University, Sydney, Australia
Qiongkai Xu
Qiongkai Xu
Lecturer (Asst. Prof.) @ Macquarie University
Natural Language ProcessingMachine LearningPrivacy and SecurityEvaluationData Mining
Annabelle McIver
Annabelle McIver
Macquarie University
Mark Dras
Mark Dras
Professor of Computing, Macquarie University