The Sample Complexity of Membership Inference and Privacy Auditing

📅 2025-08-26

📈 Citations: 0

✨ Influential: 0

career value

246K/year

🤖 AI Summary

This work establishes a lower bound on the sample complexity of membership inference attacks (MIAs) in the Gaussian mean estimation setting—i.e., the minimum number of reference samples an attacker requires to reliably determine whether a given individual belongs to the training set. Using tools from statistical learning theory and information theory, we provide the first rigorous proof that any successful MIA necessitates $Omega(n + n^2 ho^2)$ reference samples, where $n$ is the training set size and $ ho$ quantifies the model parameter bias. This bound implies that existing practical MIAs—relying only on $O(n)$ samples—must fail under high-accuracy estimation ($ ho gg 1/sqrt{n}$), exposing their systematic underestimation of privacy risk. Our result delivers the first tight characterization of the fundamental limits of membership inference, offering critical insights for privacy risk assessment frameworks and the design of robust, privacy-preserving learning algorithms.

Technology Category

Application Category

📝 Abstract

A membership-inference attack gets the output of a learning algorithm, and a target individual, and tries to determine whether this individual is a member of the training data or an independent sample from the same distribution. A successful membership-inference attack typically requires the attacker to have some knowledge about the distribution that the training data was sampled from, and this knowledge is often captured through a set of independent reference samples from that distribution. In this work we study how much information the attacker needs for membership inference by investigating the sample complexity-the minimum number of reference samples required-for a successful attack. We study this question in the fundamental setting of Gaussian mean estimation where the learning algorithm is given $n$ samples from a Gaussian distribution $mathcal{N}(μ,Σ)$ in $d$ dimensions, and tries to estimate $hatμ$ up to some error $mathbb{E}[|hat μ- μ|^2_Σ]leq ρ^2 d$. Our result shows that for membership inference in this setting, $Ω(n + n^2 ρ^2)$ samples can be necessary to carry out any attack that competes with a fully informed attacker. Our result is the first to show that the attacker sometimes needs many more samples than the training algorithm uses to train the model. This result has significant implications for practice, as all attacks used in practice have a restricted form that uses $O(n)$ samples and cannot benefit from $ω(n)$ samples. Thus, these attacks may be underestimating the possibility of membership inference, and better attacks may be possible when information about the distribution is easy to obtain.

Problem

Research questions and friction points this paper is trying to address.

Determining the sample complexity for membership inference attacks

Investigating reference samples needed for successful privacy attacks

Establishing lower bounds on attacker's information requirements

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sample complexity analysis for membership inference

Gaussian mean estimation setting with d dimensions

Lower bound on reference samples for attacks

🔎 Similar Papers

Some Targets Are Harder to Identify than Others: Quantifying the Target-dependent Membership Leakage