PREM: Privately Answering Statistical Queries with Relative Error

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This paper addresses the challenge of controlling relative error in differentially private synthetic data publication. It proposes the PREM framework, which—under (ε,δ)-differential privacy—achieves the first (1±ζ) relative error guarantee for arbitrary query families ℱ. Methodologically, PREM integrates private optimization via multiplicative weights update (MWU), sensitivity tuning, Gaussian/Laplace noise injection, and adaptive query selection. Its key contribution is breaking classical lower-bound barriers: it reduces additive error to poly(log|ℱ|, log|𝒳|, log n, log(1/δ), 1/ε, 1/ζ), depending only on logarithmic parameters—not on the原始 domain size |𝒳| or query family cardinality |ℱ|. Theoretical analysis establishes a nearly tight lower bound that matches this bound, demonstrating significant improvements in practicality and accuracy—particularly in high-dimensional, sparse settings.

Technology Category

Application Category

📝 Abstract

We introduce $mathsf{PREM}$ (Private Relative Error Multiplicative weight update), a new framework for generating synthetic data that achieves a relative error guarantee for statistical queries under $(varepsilon, delta)$ differential privacy (DP). Namely, for a domain ${cal X}$, a family ${cal F}$ of queries $f : {cal X} o {0, 1}$, and $zeta>0$, our framework yields a mechanism that on input dataset $D in {cal X}^n$ outputs a synthetic dataset $widehat{D} in {cal X}^n$ such that all statistical queries in ${cal F}$ on $D$, namely $sum_{x in D} f(x)$ for $f in {cal F}$, are within a $1 pm zeta$ multiplicative factor of the corresponding value on $widehat{D}$ up to an additive error that is polynomial in $log |{cal F}|$, $log |{cal X}|$, $log n$, $log(1/delta)$, $1/varepsilon$, and $1/zeta$. In contrast, any $(varepsilon, delta)$-DP mechanism is known to require worst-case additive error that is polynomial in at least one of $n, |{cal F}|$, or $|{cal X}|$. We complement our algorithm with nearly matching lower bounds.

Problem

Research questions and friction points this paper is trying to address.

Achieving relative error in statistical queries

Ensuring differential privacy in data synthesis

Reducing additive error in synthetic datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Private Relative Error Mechanism

Synthetic data generation

Differential privacy guarantee

🔎 Similar Papers

PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding

2024-07-03PRIVATENLPCitations: 12

💼 Related Jobs

Research Engineer, Privacy

OpenAI

$380K – $445K • Offers Equity

San Francisco

Authors to Follow