🤖 AI Summary
This study addresses the challenge of effectively perturbing full-length texts under differential privacy constraints to simultaneously preserve privacy and maintain utility. The authors systematically evaluate various combinations of text chunking strategies and privacy budget allocation mechanisms, revealing for the first time their critical influence on the performance of differentially private text obfuscation. Experimental results demonstrate that, under a fixed total privacy budget, different chunking approaches and ε allocation schemes can substantially alter obfuscation efficacy. These findings validate the feasibility of optimizing the perturbation pipeline to improve the privacy–utility trade-off and provide empirical evidence to support the practical deployment of differentially private text release mechanisms.
📝 Abstract
The goal of differentially private text obfuscation is to obfuscate, or "perturb", input texts with Differential Privacy (DP) guarantees, such that the private output texts are quantifiably indistinguishable from the originals. While perturbation at the word level is intuitive, meaningful text privatization happens on complete documents. Recent research has laid the groundwork for reasoning about privacy budget distribution, namely, how an overall $\varepsilon$ budget can be sensibly distributed among the component pieces of a text. We perform a systematic evaluation of multiple text decomposition and budget distribution techniques in the context of DP text obfuscation, testing how different methods for chunking texts can be combined with techniques for allocating $\varepsilon$ to these chunks. Our experiments reveal that such design choices are very important, as even with comparable privacy budgets, significantly different results can occur based on which methods are chosen. In this, we provide credible evidence of the feasibility of maximizing empirical trade-offs by optimizing DP obfuscation procedures.