Near-optimal algorithms for private estimation and sequential testing of collision probability

📅 2025-04-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the efficient estimation and sequential testing of collision probabilities for discrete distributions under local differential privacy (LDP). To overcome limitations of existing methods—namely high sample complexity and reliance on prior knowledge of error bounds—we propose two novel algorithms: (1) the first LDP collision probability estimator achieving optimal sample complexity Õ(1/(α²ε²)), improving upon prior work by a factor of 1/α²; and (2) the first adaptive sequential testing algorithm that does not require prespecifying the accuracy parameter ε, attaining near-optimal Õ(1/ε²) sample complexity even when ε is unknown. Our technical contributions integrate a randomized response variant, empirical process analysis, and rigorous error control theory. Extensive experiments demonstrate that our methods significantly outperform baselines in both estimation accuracy and sample efficiency.

Technology Category

Application Category

📝 Abstract
We present new algorithms for estimating and testing emph{collision probability}, a fundamental measure of the spread of a discrete distribution that is widely used in many scientific fields. We describe an algorithm that satisfies $(alpha, eta)$-local differential privacy and estimates collision probability with error at most $epsilon$ using $ ilde{O}left(frac{log(1/eta)}{alpha^2 epsilon^2} ight)$ samples for $alpha le 1$, which improves over previous work by a factor of $frac{1}{alpha^2}$. We also present a sequential testing algorithm for collision probability, which can distinguish between collision probability values that are separated by $epsilon$ using $ ilde{O}(frac{1}{epsilon^2})$ samples, even when $epsilon$ is unknown. Our algorithms have nearly the optimal sample complexity, and in experiments we show that they require significantly fewer samples than previous methods.
Problem

Research questions and friction points this paper is trying to address.

Estimating collision probability with local differential privacy
Sequential testing of collision probability with unknown epsilon
Achieving near-optimal sample complexity for distribution analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Local differential privacy for collision probability estimation
Sequential testing with unknown epsilon parameter
Near-optimal sample complexity improvement
🔎 Similar Papers
No similar papers found.
R
R. Busa-Fekete
Google Research, NY, USA
Umar Syed
Umar Syed
Research Scientist, Google
Machine Learning