Advances in Bayesian random partition models: A comprehensive review

📅 2023-03-30
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses three core challenges in Bayesian random partition models: modeling uncertainty in the number of clusters, constructing appropriate priors, and summarizing posterior distributions. To overcome theoretical inconsistencies in existing cluster-number estimation methods, we systematically classify and evaluate the computational tractability and statistical validity of partition priors—including the Pitman–Yor process and Chinese Restaurant Process—for the first time. We propose a novel paradigm for posterior inference that combines dimensionality reduction in the partition space with structured posterior summarization, integrating MCMC sampling, variational inference, and posterior consistency analysis. Furthermore, we establish the first comprehensive methodological evaluation framework, identifying critical theoretical bottlenecks. Our work provides an interpretable, scalable, and adaptive clustering methodology tailored for high-dimensional data applications—such as genomics and medical imaging—while ensuring principled uncertainty quantification and model selection.
📝 Abstract
Clustering is a crucial task in various domains of knowledge, including medicine, epidemiology, genomics, environmental science, economics, and visual sciences, among others. Methodologies for inferring the number of clusters have often been shown to be inconsistent, and incorporating a dependence structure among clusters introduces additional challenges in the estimation process. In a Bayesian framework, clustering is performed by treating the unknown partition as a random object and defining a prior distribution for it. This prior distribution can be induced by models assumed for the observations or directly defined on the partition itself. However, recent findings have revealed difficulties in consistently estimating the number of clusters and, consequently, the partition. Furthermore, summarizing the posterior distribution of the partition remains an open problem due to the high dimensionality of the partition space. This study aims to review Bayesian approaches for random partition models, highlighting the advantages and disadvantages of each method, and suggesting potential avenues for future research.
Problem

Research questions and friction points this paper is trying to address.

Inconsistent methods for inferring cluster numbers in various domains
Challenges in estimating dependent cluster structures in Bayesian frameworks
Difficulties in summarizing high-dimensional posterior partition distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian random partition models for clustering
Prior distribution on unknown partitions
Reviewing methods for posterior distribution summarization
🔎 Similar Papers
No similar papers found.