Preempting Text Sanitization Utility in Resource-Constrained Privacy-Preserving LLM Interactions

📅 2024-11-18

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Existing multidimensional Laplace-based differential privacy (DP) text sanitization methods for mitigating prompt-level privacy leakage in LLM online services suffer from unpredictable utility, leading to ineffective API calls and wasted computational resources. Method: We propose the first utility-aware pre-deployment prediction architecture, employing a lightweight neural model to locally estimate the downstream task performance of sanitized prompts—thereby preventing low-utility requests from being submitted to remote LLMs. Additionally, we conduct the first systematic analysis of how key implementation choices in distance-based text DP—particularly distance metrics and perturbation dimensions—significantly impact utility. Results: Experiments on real-world LLM services demonstrate a ~12% reduction in invalid API calls, substantially lowering computational overhead and inference costs. Our work advances standardized, utility-conscious practice of DP for text.

Technology Category

Application Category

📝 Abstract

Interactions with online Large Language Models raise privacy issues where providers can gather sensitive information about users and their companies from the prompts. While Differential Privacy can be applied on textual prompts through the Multidimensional Laplace Mechanism, we show that it is difficult to anticipate the utility of such sanitized prompt. Poor utility has clear monetary consequences for LLM services charging on a pay-per-use model as well as great amount of computing resources wasted. To this end, we propose an architecture to predict the utility of a given sanitized prompt before it is sent to the LLM. We experimentally show that our architecture helps prevent such resource waste for up to 12% of the prompts. We also reproduce experiments from one of the most cited paper on distance-based DP for text sanitization and show that a potential performance-driven implementation choice completely changes the output while not being explicitly defined in the paper.

Problem

Research questions and friction points this paper is trying to address.

Predict utility of sanitized prompts in LLM interactions

Prevent resource waste in privacy-preserving LLM services

Address ambiguity in performance-driven text sanitization methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Predicts utility of sanitized prompts pre-submission.

Reduces resource waste by 12% in LLM interactions.

Evaluates performance-driven DP text sanitization impacts.

🔎 Similar Papers

Robust Utility-Preserving Text Anonymization Based on Large Language Models