A Semiparametric Gaussian Mixture Model with Spatial Dependence and Its Application to Whole-Slide Image Clustering Analysis

📅 2025-10-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient spatial information utilization in whole-slide image (WSI) clustering, this paper proposes a semiparametric Gaussian mixture model (SGMM). Within the standard GMM framework, SGMM models the mixing proportions as nonparametric functions of spatial location, thereby explicitly encoding spatial dependence and encouraging spatially coherent clustering of histologically similar regions. The method integrates conditional probability modeling with a customized expectation-maximization (EM) algorithm and establishes asymptotic theory guaranteeing consistency and convergence of parameter estimation. Compared to conventional GMMs, SGMM achieves superior flexibility and interpretability while preserving statistical rigor. Evaluated on the CAMELYON16 dataset of breast cancer WSIs, SGMM significantly improves clustering accuracy over baseline methods. Extensive finite-sample experiments further demonstrate its effectiveness and robustness under practical data constraints.

Technology Category

Application Category

📝 Abstract
We develop here a semiparametric Gaussian mixture model (SGMM) for unsupervised learning with valuable spatial information taken into consideration. Specifically, we assume for each instance a random location. Then, conditional on this random location, we assume for the feature vector a standard Gaussian mixture model (GMM). The proposed SGMM allows the mixing probability to be nonparametrically related to the spatial location. Compared with a classical GMM, SGMM is considerably more flexible and allows the instances from the same class to be spatially clustered. To estimate the SGMM, novel EM algorithms are developed and rigorous asymptotic theories are established. Extensive numerical simulations are conducted to demonstrate our finite sample performance. For a real application, we apply our SGMM method to the CAMELYON16 dataset of whole-slide images (WSIs) for breast cancer detection. The SGMM method demonstrates outstanding clustering performance.
Problem

Research questions and friction points this paper is trying to address.

Incorporates spatial dependence into Gaussian mixture models for clustering
Models nonparametric mixing probabilities based on spatial locations
Applies method to breast cancer detection using whole-slide images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semiparametric Gaussian mixture model with spatial dependence
EM algorithms for model estimation with asymptotic theories
Nonparametric mixing probability linked to spatial location
🔎 Similar Papers
No similar papers found.
B
Baichen Yu
Guanghua School of Management, Peking University, Beijing, China
J
Jin Liu
School of Statistics and Data Science, KLMDASR, LEBPS and LPMC, Nankai University, Tianjin, China
Hansheng Wang
Hansheng Wang
Guanghua School of Management, Peking University
Statistics in Business