Statistical Inference for Fuzzy Clustering

📅 2026-01-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of a statistical inference foundation in traditional fuzzy clustering, which struggles with imbalanced cluster sizes and uncertainty quantification. The authors propose a weighted fuzzy c-means (WFCM) method that introduces cluster-specific weights to construct a normalized density model and employs a block-wise MM algorithm to jointly estimate membership degrees, cluster centers, and model parameters. For the first time, a comprehensive statistical inference framework is established for fuzzy clustering, with theoretical guarantees including consistency and asymptotic normality of parameter estimates. Likelihood ratio tests and bootstrap confidence intervals are developed accordingly. Empirical validation on single-cell RNA-seq and ADNI data demonstrates that WFCM effectively handles cluster imbalance, enables robust uncertainty quantification, and reveals soft clustering structures—ranging from discrete cell subpopulations to the continuous spectrum of Alzheimer’s disease pathology.

Technology Category

Application Category

📝 Abstract
Clustering is a central tool in biomedical research for discovering heterogeneous patient subpopulations, where group boundaries are often diffuse rather than sharply separated. Traditional methods produce hard partitions, whereas soft clustering methods such as fuzzy $c$-means (FCM) allow mixed memberships and better capture uncertainty and gradual transitions. Despite the widespread use of FCM, principled statistical inference for fuzzy clustering remains limited. We develop a new framework for weighted fuzzy $c$-means (WFCM) for settings with potential cluster size imbalance. Cluster-specific weights rebalance the classical FCM criterion so that smaller clusters are not overwhelmed by dominant groups, and the weighted objective induces a normalized density model with scale parameter $\sigma$ and fuzziness parameter $m$. Estimation is performed via a blockwise majorize--minimize (MM) procedure that alternates closed-form membership and centroid updates with likelihood-based updates of $(\sigma,\bw)$. The intractable normalizing constant is approximated by importance sampling using a data-adaptive Gaussian mixture proposal. We further provide likelihood ratio tests for comparing cluster centers and bootstrap-based confidence intervals. We establish consistency and asymptotic normality of the maximum likelihood estimator, validate the method through simulations, and illustrate it using single-cell RNA-seq and Alzheimer disease Neuroimaging Initiative (ADNI) data. These applications demonstrate stable uncertainty quantification and biologically meaningful soft memberships, ranging from well-separated cell populations under imbalance to a graded AD versus non-AD continuum consistent with disease progression.
Problem

Research questions and friction points this paper is trying to address.

fuzzy clustering
statistical inference
cluster imbalance
soft membership
uncertainty quantification
Innovation

Methods, ideas, or system contributions that make the work stand out.

weighted fuzzy c-means
statistical inference
majorize-minimize algorithm
importance sampling
asymptotic normality
🔎 Similar Papers
No similar papers found.
Q
Qiuyi Wu
Department of Biostatistics & Bioinformatics, Duke University
Zihan Zhu
Zihan Zhu
ETH Zurich
computer visioncomputer graphics
A
Anru R. Zhang
Department of Biostatistics & Bioinformatics and Department of Computer Science, Duke University