Hot PATE: Private Aggregation of Distributions for Diverse Task

📅 2023-12-04
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
PATE frameworks face a privacy–utility trade-off in output-diverse tasks (e.g., text generation): high output diversity reduces teacher consensus, necessitating larger differential privacy (DP) noise and degrading utility; conversely, suppressing diversity incurs large language model (LLM) knowledge loss. This work introduces a novel distribution-output-oriented privacy aggregation paradigm. We formally define “output diversity protection” and design a zero-additional-privacy-cost diversity transfer mechanism, enabling lightweight integration with black-box LLMs via API calls only. Our method unifies differential privacy, teacher ensembling, probabilistic distribution distillation, and randomized response. On in-context learning tasks, it achieves order-of-magnitude utility gains over conventional “cold” PATE—while preserving end-to-end DP guarantees—and remains fully compatible with existing deployment architectures.
📝 Abstract
The Private Aggregation of Teacher Ensembles (PATE) framework enables privacy-preserving machine learning by aggregating responses from disjoint subsets of sensitive data. Adaptations of PATE to tasks with inherent output diversity such as text generation face a core tension: preserving output diversity reduces teacher agreement, which in turn increases the noise required for differential privacy, degrading utility. Yet suppressing diversity is counterproductive, as modern large language models encapsulate knowledge in their output distributions. We propose Hot PATE, a variant tailored to settings where outputs are distributions. We formally define what it means to preserve diversity and introduce an efficient aggregation mechanism that transfers diversity to the randomized output without incurring additional privacy cost. Our method can be implemented with only API access to proprietary models and serves as a drop-in replacement for existing"cold"PATE aggregators. Empirically, Hot PATE achieves orders-of-magnitude improvement on in-context learning tasks.
Problem

Research questions and friction points this paper is trying to address.

Balancing output diversity and privacy in PATE for text generation
Reducing noise in differential privacy while preserving distribution diversity
Enhancing utility of private aggregation for diverse output tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tailors PATE for diverse output distributions
Preserves diversity without extra privacy cost
Works with API access to proprietary models