Approximate DBSCAN under Differential Privacy

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Existing differentially private (DP) DBSCAN algorithms only release cluster labels, offering limited practical utility. Method: We propose replacing labels with “spans”—a novel output representation for DP-DBSCAN—introducing the span concept to DP clustering for the first time. Our approach employs a linear-time algorithm combining DP histograms and geometric partitioning, enabling efficient private clustering in arbitrary constant-dimensional spaces. Contribution/Results: We prove the algorithm achieves an asymptotically optimal approximation ratio and establish a matching lower bound. Experiments on synthetic and real-world datasets demonstrate both high clustering utility and computational efficiency. This work overcomes a key practicality bottleneck in DP clustering and establishes a new paradigm for privacy-preserving density-based clustering.

Technology Category

Application Category

📝 Abstract

This paper revisits the DBSCAN problem under differential privacy (DP). Existing DP-DBSCAN algorithms aim at publishing the cluster labels of the input points. However, we show that both empirically and theoretically, this approach cannot offer any utility in the published results. We therefore propose an alternative definition of DP-DBSCAN based on the notion of spans. We argue that publishing the spans actually better serves the purposes of visualization and classification of DBSCAN. Then we present a linear-time DP-DBSCAN algorithm achieving the sandwich quality guarantee in any constant dimensions, as well as matching lower bounds on the approximation ratio. A key building block in our algorithm is a linear-time algorithm for constructing a histogram under pure-DP, which is of independent interest. Finally, we conducted experiments on both synthetic and real-world datasets to verify the practical performance of our DP-DBSCAN algorithm.

Problem

Research questions and friction points this paper is trying to address.

Addressing utility limitations in differentially private DBSCAN clustering

Proposing span-based definition for visualization and classification purposes

Developing linear-time algorithm with theoretical guarantees for DP-DBSCAN

Innovation

Methods, ideas, or system contributions that make the work stand out.

Alternative DP-DBSCAN definition using spans

Linear-time pure-DP histogram construction algorithm

Sandwich quality guarantee with matching bounds

🔎 Similar Papers

FastLloyd: Federated, Accurate, Secure, and Tunable k-Means Clustering with Differential Privacy