🤖 AI Summary
Conventional convex clustering suffers from sensitivity to noise/outliers in high-dimensional data, over-strong fusion regularization impeding cluster formation, reliance on pre-specified cluster number $k$, and initialization sensitivity. Method: We propose a robust adaptive convex clustering framework that innovatively incorporates a median-mean estimator into the convex clustering objective function and couples it with an adaptive fusion regularization mechanism, enabling automatic cluster structure identification without specifying $k$. An efficient iterative convex optimization algorithm ensures stable convergence for high-dimensional and large-scale data. Contribution/Results: We establish weak consistency under mild conditions. Extensive experiments demonstrate that our method significantly outperforms existing convex and robust clustering approaches on both noisy synthetic and real-world high-dimensional datasets—particularly excelling under strong noise, high dimensionality, and large scale.
📝 Abstract
Clustering approaches that utilize convex loss functions have recently attracted growing interest in the formation of compact data clusters. Although classical methods like k-means and its wide family of variants are still widely used, all of them require the number of clusters k to be supplied as input, and many are notably sensitive to initialization. Convex clustering provides a more stable alternative by formulating the clustering task as a convex optimization problem, ensuring a unique global solution. However, it faces challenges in handling high-dimensional data, especially in the presence of noise and outliers. Additionally, strong fusion regularization, controlled by the tuning parameter, can hinder effective cluster formation within a convex clustering framework. To overcome these challenges, we introduce a robust approach that integrates convex clustering with the Median of Means (MoM) estimator, thus developing an outlier-resistant and efficient clustering framework that does not necessitate prior knowledge of the number of clusters. By leveraging the robustness of MoM alongside the stability of convex clustering, our method enhances both performance and efficiency, especially on large-scale datasets. Theoretical analysis demonstrates weak consistency under specific conditions, while experiments on synthetic and real-world datasets validate the method's superior performance compared to existing approaches.