🤖 AI Summary
This paper addresses the efficient estimation and sequential testing of collision probabilities for discrete distributions under local differential privacy (LDP). To overcome limitations of existing methods—namely high sample complexity and reliance on prior knowledge of error bounds—we propose two novel algorithms: (1) the first LDP collision probability estimator achieving optimal sample complexity Õ(1/(α²ε²)), improving upon prior work by a factor of 1/α²; and (2) the first adaptive sequential testing algorithm that does not require prespecifying the accuracy parameter ε, attaining near-optimal Õ(1/ε²) sample complexity even when ε is unknown. Our technical contributions integrate a randomized response variant, empirical process analysis, and rigorous error control theory. Extensive experiments demonstrate that our methods significantly outperform baselines in both estimation accuracy and sample efficiency.
📝 Abstract
We present new algorithms for estimating and testing emph{collision probability}, a fundamental measure of the spread of a discrete distribution that is widely used in many scientific fields. We describe an algorithm that satisfies $(alpha, eta)$-local differential privacy and estimates collision probability with error at most $epsilon$ using $ ilde{O}left(frac{log(1/eta)}{alpha^2 epsilon^2}
ight)$ samples for $alpha le 1$, which improves over previous work by a factor of $frac{1}{alpha^2}$. We also present a sequential testing algorithm for collision probability, which can distinguish between collision probability values that are separated by $epsilon$ using $ ilde{O}(frac{1}{epsilon^2})$ samples, even when $epsilon$ is unknown. Our algorithms have nearly the optimal sample complexity, and in experiments we show that they require significantly fewer samples than previous methods.