Semiparametric Elliptical Mixture Clustering for High-Dimensional Data

📅 2026-05-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

242K/year
🤖 AI Summary
Clustering high-dimensional, heavy-tailed data with approximately elliptical distributions poses significant challenges due to violations of conventional light-tailed or fully parametric assumptions. This work proposes a semiparametric mixture framework for elliptical clustering that circumvents pre-specified radial distributional families by incorporating cluster-specific centers, an unknown common radial generating function, and a sparse common precision–shape matrix. The method employs a generalized expectation–maximization algorithm integrating transformed radius estimation, radial-score-based center updates, and Tyler-POET-GLASSO optimization for the precision matrix, alongside a data-driven criterion for selecting the number of clusters. Theoretical analysis establishes consistency of model parameters and misclassification error in high dimensions, while experiments demonstrate that the approach achieves computational efficiency, clustering accuracy, and strong robustness in heavy-tailed elliptical settings.
📝 Abstract
Clustering high-dimensional data is especially challenging when cluster distributions are heavy tailed and only approximately elliptical. Existing high-dimensional methods are largely built for Gaussian or other light-tailed models, whereas classical robust elliptical procedures are mostly low dimensional or rely on fully parametric radial families. We propose a semiparametric elliptical mixture clustering framework with cluster-specific centers, an unknown common radial generator, and a common sparse precision-shape matrix, together with a data-driven rule for selecting the number of clusters. A generalized expectation-maximization (GEM) algorithm is developed by combining transformed-radius estimation of the radial generator, radial-score center updates, and a Tyler-POET-GLASSO update for the common precision-shape matrix. The method avoids specifying a parametric radial family and remains computationally feasible in high dimensions. We establish high-dimensional consistency for the estimated model components and the excess misclustering error. Simulation studies and a handwritten-digit application demonstrate the competitive performance and robustness of the proposed method, particularly in heavy-tailed elliptical settings.
Problem

Research questions and friction points this paper is trying to address.

high-dimensional clustering
heavy-tailed distributions
elliptical mixtures
robust clustering
semiparametric modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

semiparametric
elliptical mixture
high-dimensional clustering
robust estimation
sparse precision matrix
🔎 Similar Papers
2021-06-14IEEE Transactions on Visualization and Computer GraphicsCitations: 12