Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy

📅 2026-03-24

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This work addresses the limitation that the nominal local differential privacy (LDP) parameter ε fails to accurately reflect the actual privacy guarantees of text rewriting mechanisms, thereby hindering meaningful comparisons of privacy–utility trade-offs across different methods. To bridge this gap, the paper proposes TeDA, a novel framework that introduces, for the first time, an empirical cross-mechanism privacy calibration approach. TeDA jointly evaluates the distinguishability of rewritten outputs in both textual surface forms and embedding spaces through hypothesis testing, effectively linking the theoretical ε with practical indistinguishability. Experimental results demonstrate that mechanisms sharing the same nominal ε can exhibit substantially different empirical privacy losses, confirming TeDA’s effectiveness in enhancing the practicality and comparability of privacy assessments for LDP-based text publishing.

Technology Category

Application Category

📝 Abstract

The growing use of large language models has increased interest in sharing textual data in a privacy-preserving manner. One prominent line of work addresses this challenge through text rewriting under Local Differential Privacy (LDP), where input texts are locally obfuscated before release with formal privacy guarantees. These guarantees are typically expressed by a parameter $\varepsilon$ that upper bounds the worst-case privacy loss. However, nominal $\varepsilon$ values are often difficult to interpret and compare across mechanisms. In this work, we investigate how to empirically calibrate across text rewriting mechanisms under LDP. We propose TeDA, which formulates calibration via a hypothesis-testing framework that instantiates text distinguishability audits in both surface and embedding spaces, enabling empirical assessment of indistinguishability from privatized texts. Applying this calibration to several representative mechanisms, we demonstrate that similar nominal $\varepsilon$ bounds can imply very different levels of distinguishability. Empirical calibration thus provides a more comparable footing for evaluating privacy-utility trade-offs, as well as a practical tool for mechanism comparison and analysis in real-world LDP text rewriting deployments.

Problem

Research questions and friction points this paper is trying to address.

Local Differential Privacy

Text Rewriting

Privacy Calibration

Empirical Privacy Loss

Privacy Parameter Interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Local Differential Privacy

Text Rewriting

Empirical Calibration