🤖 AI Summary
Existing differentially private (DP) DBSCAN algorithms only release cluster labels, offering limited practical utility. Method: We propose replacing labels with “spans”—a novel output representation for DP-DBSCAN—introducing the span concept to DP clustering for the first time. Our approach employs a linear-time algorithm combining DP histograms and geometric partitioning, enabling efficient private clustering in arbitrary constant-dimensional spaces. Contribution/Results: We prove the algorithm achieves an asymptotically optimal approximation ratio and establish a matching lower bound. Experiments on synthetic and real-world datasets demonstrate both high clustering utility and computational efficiency. This work overcomes a key practicality bottleneck in DP clustering and establishes a new paradigm for privacy-preserving density-based clustering.
📝 Abstract
This paper revisits the DBSCAN problem under differential privacy (DP). Existing DP-DBSCAN algorithms aim at publishing the cluster labels of the input points. However, we show that both empirically and theoretically, this approach cannot offer any utility in the published results. We therefore propose an alternative definition of DP-DBSCAN based on the notion of spans. We argue that publishing the spans actually better serves the purposes of visualization and classification of DBSCAN. Then we present a linear-time DP-DBSCAN algorithm achieving the sandwich quality guarantee in any constant dimensions, as well as matching lower bounds on the approximation ratio. A key building block in our algorithm is a linear-time algorithm for constructing a histogram under pure-DP, which is of independent interest. Finally, we conducted experiments on both synthetic and real-world datasets to verify the practical performance of our DP-DBSCAN algorithm.