Repulsive mixtures via the sparsity-inducing partition prior

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional Dirichlet priors in Gaussian mixture models (GMMs) struggle to induce both weight sparsity and component separation, hindering automatic identification of dominant clusters. To address this, we propose the Sparsity-Inducing Partition (SIP) prior, constructed from the Selberg–Dirichlet distribution and explicitly incorporating inter-weight repulsion terms to jointly enhance sparsity and separation. SIP is compatible with both fixed- and variable-component settings and is equipped with an efficient posterior sampling algorithm. Experiments demonstrate that SIP significantly improves clustering accuracy, stability, and interpretability over baseline methods—including standard Dirichlet and other sparse priors—on both synthetic benchmarks and real-world data (children’s BMI and dietary behavior). SIP effectively mitigates component overlap and enables automatic detection of a few dominant clusters without requiring pre-specified cluster counts or post-hoc merging heuristics.

Technology Category

Application Category

📝 Abstract
We introduce a novel prior distribution for modelling the weights in mixture models based on a generalisation of the Dirichlet distribution, the Selberg Dirichlet distribution. This distribution contains a repulsive term, which naturally penalises values that lie close to each other on the simplex, thus encouraging few dominating clusters. The repulsive behaviour induces additional sparsity on the number of components. We refer to this construction as sparsity-inducing partition (SIP) prior. By highlighting differences with the conventional Dirichlet distribution, we present relevant properties of the SIP prior and demonstrate their implications across a variety of mixture models, including finite mixtures with a fixed or random number of components, as well as repulsive mixtures. We propose an efficient posterior sampling algorithm and validate our model through an extensive simulation study as well as an application to a biomedical dataset describing children's Body Mass Index and eating behaviour.
Problem

Research questions and friction points this paper is trying to address.

Introduces repulsive prior for sparse mixture modeling
Penalizes similar weights to reduce cluster redundancy
Enhances component sparsity in finite and repulsive mixtures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Selberg Dirichlet distribution with repulsive term
Sparsity-inducing partition prior reduces mixture components
Efficient posterior sampling algorithm for mixture models