Near-optimal algorithms for private estimation and sequential testing of collision probability

📅 2025-04-18

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This paper addresses the efficient estimation and sequential testing of collision probabilities for discrete distributions under local differential privacy (LDP). To overcome limitations of existing methods—namely high sample complexity and reliance on prior knowledge of error bounds—we propose two novel algorithms: (1) the first LDP collision probability estimator achieving optimal sample complexity Õ(1/(α²ε²)), improving upon prior work by a factor of 1/α²; and (2) the first adaptive sequential testing algorithm that does not require prespecifying the accuracy parameter ε, attaining near-optimal Õ(1/ε²) sample complexity even when ε is unknown. Our technical contributions integrate a randomized response variant, empirical process analysis, and rigorous error control theory. Extensive experiments demonstrate that our methods significantly outperform baselines in both estimation accuracy and sample efficiency.

Technology Category

Application Category

📝 Abstract

We present new algorithms for estimating and testing emph{collision probability}, a fundamental measure of the spread of a discrete distribution that is widely used in many scientific fields. We describe an algorithm that satisfies $(alpha, eta)$-local differential privacy and estimates collision probability with error at most $epsilon$ using $ ilde{O}left(frac{log(1/eta)}{alpha^2 epsilon^2} ight)$ samples for $alpha le 1$, which improves over previous work by a factor of $frac{1}{alpha^2}$. We also present a sequential testing algorithm for collision probability, which can distinguish between collision probability values that are separated by $epsilon$ using $ ilde{O}(frac{1}{epsilon^2})$ samples, even when $epsilon$ is unknown. Our algorithms have nearly the optimal sample complexity, and in experiments we show that they require significantly fewer samples than previous methods.

Problem

Research questions and friction points this paper is trying to address.

Estimating collision probability with local differential privacy

Sequential testing of collision probability with unknown epsilon

Achieving near-optimal sample complexity for distribution analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Local differential privacy for collision probability estimation

Sequential testing with unknown epsilon parameter

Near-optimal sample complexity improvement

🔎 Similar Papers

Weakly Private Information Retrieval from Heterogeneously Trusted Servers