High-Dimensional Sparse Clustering via Iterative Semidefinite Programming Relaxed K-Means

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Clustering high-dimensional sparse data where true signals lie in an unknown low-dimensional subspace remains challenging, especially when precise estimation of sparsity patterns or precision matrices is infeasible. Method: We propose an iterative framework that jointly integrates minimum–maximum separation-bound-driven sparse feature selection with semidefinite programming (SDP) relaxation of K-means clustering—without requiring explicit estimation of sparsity parameters or the precision matrix. In each iteration, discriminative direction-based thresholding selects relevant features, while SDP relaxation solves the clustering assignment. The algorithm relies solely on computable low-order statistics, circumventing high-dimensional covariance estimation. Contribution/Results: We establish statistical consistency under high-dimensional sparse settings. Extensive simulations demonstrate that our method maintains high label recovery accuracy as dimensionality increases, significantly outperforming state-of-the-art baselines while ensuring robustness and computational tractability.

Technology Category

Application Category

📝 Abstract

We propose an iterative algorithm for clustering high-dimensional data, where the true signal lies in a much lower-dimensional space. Our method alternates between feature selection and clustering, without requiring precise estimation of sparse model parameters. Feature selection is performed by thresholding a rough estimate of the discriminative direction, while clustering is carried out via a semidefinite programming (SDP) relaxation of K-means. In the isotropic case, the algorithm is motivated by the minimax separation bound for exact recovery of cluster labels using varying sparse subsets of features. This bound highlights the critical role of variable selection in achieving exact recovery. We further extend the algorithm to settings with unknown sparse precision matrices, avoiding full model parameter estimation by computing only the minimally required quantities. Across a range of simulation settings, we find that the proposed iterative approach outperforms several state-of-the-art methods, especially in higher dimensions.

Problem

Research questions and friction points this paper is trying to address.

Clustering high-dimensional data with low-dimensional signals

Alternating feature selection and clustering without precise parameter estimation

Extending algorithm to unknown sparse precision matrices settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative algorithm for high-dimensional clustering

Feature selection via discriminative direction thresholding

Clustering via SDP relaxation of K-means

🔎 Similar Papers

No similar papers found.