Convex Clustering Redefined: Robust Learning with the Median of Means Estimator

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional convex clustering suffers from sensitivity to noise/outliers in high-dimensional data, over-strong fusion regularization impeding cluster formation, reliance on pre-specified cluster number $k$, and initialization sensitivity. Method: We propose a robust adaptive convex clustering framework that innovatively incorporates a median-mean estimator into the convex clustering objective function and couples it with an adaptive fusion regularization mechanism, enabling automatic cluster structure identification without specifying $k$. An efficient iterative convex optimization algorithm ensures stable convergence for high-dimensional and large-scale data. Contribution/Results: We establish weak consistency under mild conditions. Extensive experiments demonstrate that our method significantly outperforms existing convex and robust clustering approaches on both noisy synthetic and real-world high-dimensional datasets—particularly excelling under strong noise, high dimensionality, and large scale.

Technology Category

Application Category

📝 Abstract
Clustering approaches that utilize convex loss functions have recently attracted growing interest in the formation of compact data clusters. Although classical methods like k-means and its wide family of variants are still widely used, all of them require the number of clusters k to be supplied as input, and many are notably sensitive to initialization. Convex clustering provides a more stable alternative by formulating the clustering task as a convex optimization problem, ensuring a unique global solution. However, it faces challenges in handling high-dimensional data, especially in the presence of noise and outliers. Additionally, strong fusion regularization, controlled by the tuning parameter, can hinder effective cluster formation within a convex clustering framework. To overcome these challenges, we introduce a robust approach that integrates convex clustering with the Median of Means (MoM) estimator, thus developing an outlier-resistant and efficient clustering framework that does not necessitate prior knowledge of the number of clusters. By leveraging the robustness of MoM alongside the stability of convex clustering, our method enhances both performance and efficiency, especially on large-scale datasets. Theoretical analysis demonstrates weak consistency under specific conditions, while experiments on synthetic and real-world datasets validate the method's superior performance compared to existing approaches.
Problem

Research questions and friction points this paper is trying to address.

Convex clustering requires specifying cluster count and is sensitive to outliers
Strong fusion regularization hinders effective cluster formation in high dimensions
Existing methods struggle with noise and lack robustness in large datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates convex clustering with Median of Means estimator
Develops outlier-resistant framework without cluster number knowledge
Enhances performance on large-scale datasets through robust stability
🔎 Similar Papers
No similar papers found.
Sourav De
Sourav De
Indian Statistical Institute, Kolkata
K
Koustav Chowdhury
Indian Statistical Institute, Kolkata
B
Bibhabasu Mandal
Indian Statistical Institute, Kolkata
S
Sagar Ghosh
Department of Statistics and Data Science, University of Texas at Austin
Swagatam Das
Swagatam Das
Professor, Electronics and Communication Sciences Unit, Indian Statistical Institute, Kolkata
Artificial IntelligenceMetaheuristicsDifferential EvolutionSwarm IntelligenceMachine Learning
D
Debolina Paul
Department of Statistics, University of Oxford
S
Saptarshi Chakraborty
Department of Statistics, University of Michigan