Hot PATE: Private Aggregation of Distributions for Diverse Task

📅 2023-12-04

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

189K/year

🤖 AI Summary

PATE frameworks face a privacy–utility trade-off in output-diverse tasks (e.g., text generation): high output diversity reduces teacher consensus, necessitating larger differential privacy (DP) noise and degrading utility; conversely, suppressing diversity incurs large language model (LLM) knowledge loss. This work introduces a novel distribution-output-oriented privacy aggregation paradigm. We formally define “output diversity protection” and design a zero-additional-privacy-cost diversity transfer mechanism, enabling lightweight integration with black-box LLMs via API calls only. Our method unifies differential privacy, teacher ensembling, probabilistic distribution distillation, and randomized response. On in-context learning tasks, it achieves order-of-magnitude utility gains over conventional “cold” PATE—while preserving end-to-end DP guarantees—and remains fully compatible with existing deployment architectures.

📝 Abstract

The Private Aggregation of Teacher Ensembles (PATE) framework enables privacy-preserving machine learning by aggregating responses from disjoint subsets of sensitive data. Adaptations of PATE to tasks with inherent output diversity such as text generation face a core tension: preserving output diversity reduces teacher agreement, which in turn increases the noise required for differential privacy, degrading utility. Yet suppressing diversity is counterproductive, as modern large language models encapsulate knowledge in their output distributions. We propose Hot PATE, a variant tailored to settings where outputs are distributions. We formally define what it means to preserve diversity and introduce an efficient aggregation mechanism that transfers diversity to the randomized output without incurring additional privacy cost. Our method can be implemented with only API access to proprietary models and serves as a drop-in replacement for existing"cold"PATE aggregators. Empirically, Hot PATE achieves orders-of-magnitude improvement on in-context learning tasks.

Problem

Research questions and friction points this paper is trying to address.

Balancing output diversity and privacy in PATE for text generation

Reducing noise in differential privacy while preserving distribution diversity

Enhancing utility of private aggregation for diverse output tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tailors PATE for diverse output distributions

Preserves diversity without extra privacy cost

Works with API access to proprietary models

🔎 Similar Papers

PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration