🤖 AI Summary
To address insufficient spatial information utilization in whole-slide image (WSI) clustering, this paper proposes a semiparametric Gaussian mixture model (SGMM). Within the standard GMM framework, SGMM models the mixing proportions as nonparametric functions of spatial location, thereby explicitly encoding spatial dependence and encouraging spatially coherent clustering of histologically similar regions. The method integrates conditional probability modeling with a customized expectation-maximization (EM) algorithm and establishes asymptotic theory guaranteeing consistency and convergence of parameter estimation. Compared to conventional GMMs, SGMM achieves superior flexibility and interpretability while preserving statistical rigor. Evaluated on the CAMELYON16 dataset of breast cancer WSIs, SGMM significantly improves clustering accuracy over baseline methods. Extensive finite-sample experiments further demonstrate its effectiveness and robustness under practical data constraints.
📝 Abstract
We develop here a semiparametric Gaussian mixture model (SGMM) for unsupervised learning with valuable spatial information taken into consideration. Specifically, we assume for each instance a random location. Then, conditional on this random location, we assume for the feature vector a standard Gaussian mixture model (GMM). The proposed SGMM allows the mixing probability to be nonparametrically related to the spatial location. Compared with a classical GMM, SGMM is considerably more flexible and allows the instances from the same class to be spatially clustered. To estimate the SGMM, novel EM algorithms are developed and rigorous asymptotic theories are established. Extensive numerical simulations are conducted to demonstrate our finite sample performance. For a real application, we apply our SGMM method to the CAMELYON16 dataset of whole-slide images (WSIs) for breast cancer detection. The SGMM method demonstrates outstanding clustering performance.