Convex Clustering Redefined: Robust Learning with the Median of Means Estimator

📅 2025-11-12

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

Conventional convex clustering suffers from sensitivity to noise/outliers in high-dimensional data, over-strong fusion regularization impeding cluster formation, reliance on pre-specified cluster number $k$, and initialization sensitivity. Method: We propose a robust adaptive convex clustering framework that innovatively incorporates a median-mean estimator into the convex clustering objective function and couples it with an adaptive fusion regularization mechanism, enabling automatic cluster structure identification without specifying $k$. An efficient iterative convex optimization algorithm ensures stable convergence for high-dimensional and large-scale data. Contribution/Results: We establish weak consistency under mild conditions. Extensive experiments demonstrate that our method significantly outperforms existing convex and robust clustering approaches on both noisy synthetic and real-world high-dimensional datasets—particularly excelling under strong noise, high dimensionality, and large scale.

Technology Category

Application Category

📝 Abstract

Clustering approaches that utilize convex loss functions have recently attracted growing interest in the formation of compact data clusters. Although classical methods like k-means and its wide family of variants are still widely used, all of them require the number of clusters k to be supplied as input, and many are notably sensitive to initialization. Convex clustering provides a more stable alternative by formulating the clustering task as a convex optimization problem, ensuring a unique global solution. However, it faces challenges in handling high-dimensional data, especially in the presence of noise and outliers. Additionally, strong fusion regularization, controlled by the tuning parameter, can hinder effective cluster formation within a convex clustering framework. To overcome these challenges, we introduce a robust approach that integrates convex clustering with the Median of Means (MoM) estimator, thus developing an outlier-resistant and efficient clustering framework that does not necessitate prior knowledge of the number of clusters. By leveraging the robustness of MoM alongside the stability of convex clustering, our method enhances both performance and efficiency, especially on large-scale datasets. Theoretical analysis demonstrates weak consistency under specific conditions, while experiments on synthetic and real-world datasets validate the method's superior performance compared to existing approaches.

Problem

Research questions and friction points this paper is trying to address.

Convex clustering requires specifying cluster count and is sensitive to outliers

Strong fusion regularization hinders effective cluster formation in high dimensions

Existing methods struggle with noise and lack robustness in large datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates convex clustering with Median of Means estimator

Develops outlier-resistant framework without cluster number knowledge

Enhances performance on large-scale datasets through robust stability

🔎 Similar Papers

Interpretable Clustering: A Survey