Adversarially robust clustering with optimality guarantees

📅 2023-06-16

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This paper addresses robust clustering under sub-Gaussian mixture models in the presence of adversarial outliers, aiming for statistically optimal misclustering rate. We propose the first lightweight iterative algorithm based on coordinate-wise median estimation: it achieves convergence within a constant number of iterations via robust initialization and coordinate-wise median updates, while preserving the optimal statistical rate even under adversarial contamination. Theoretically, its misclustering rate attains the information-theoretic lower bound. Empirically, the method significantly outperforms existing robust clustering algorithms on both synthetic and real-world datasets, and matches the performance of Lloyd’s algorithm in outlier-free settings. Our key innovation lies in introducing coordinate-wise median estimation into mixture model clustering—achieving, for the first time, a unified guarantee of both robustness against adversarial outliers and statistical optimality.

📝 Abstract

We consider the problem of clustering data points coming from sub-Gaussian mixtures. Existing methods that provably achieve the optimal mislabeling error, such as the Lloyd algorithm, are usually vulnerable to outliers. In contrast, clustering methods seemingly robust to adversarial perturbations are not known to satisfy the optimal statistical guarantees. We propose a simple robust algorithm based on the coordinatewise median that obtains the optimal mislabeling rate even when we allow adversarial outliers to be present. Our algorithm achieves the optimal error rate in constant iterations when a weak initialization condition is satisfied. In the absence of outliers, in fixed dimensions, our theoretical guarantees are similar to that of the Lloyd algorithm. Extensive experiments on various simulated and public datasets are conducted to support the theoretical guarantees of our method.

Problem

Research questions and friction points this paper is trying to address.

Achieving adversarially robust clustering with optimal mislabeling guarantees

Addressing vulnerability of clustering methods to adversarial outliers

Developing robust algorithm with optimal statistical guarantees under contamination

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses coordinatewise median for robust clustering

Achieves optimal mislabeling rate with outliers

Requires weak initialization for constant iterations

🔎 Similar Papers

No similar papers found.