Adaptive Dirichlet Process mixture model with unknown concentration parameter and variance: Scaling high dimensional clustering via collapsed variational inference

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional Dirichlet process mixture models (DPMMs) struggle with adaptive clustering in high-dimensional data due to fixed hyperparameters and inefficient inference. This work proposes an adaptive DPMM based on collapsed variational inference that jointly models the concentration parameter α and the covariance structure for the first time, incorporating weakly informative priors to automatically infer both the number of clusters and their distributional shapes. The method substantially enhances clustering efficiency and robustness in high-dimensional settings, demonstrating faster convergence than state-of-the-art MCMC approaches in both Gaussian and negative binomial simulations. When applied to leukemia transcriptomic data, it not only accurately recovers known subtypes but also identifies novel, biologically meaningful subclusters.

Technology Category

Application Category

📝 Abstract
We propose a novel method that performs adaptive clustering with DPMM using collapsed VI, while incorporating weakly-informative priors for DP concentration parameter alpha and base distribution G0. We illustrate the importance of G0 covariance structure and prior choice by considering different parameterisations of the data covariance matrix. On high-dimensional Gaussian simulations, our model demonstrates substantially faster convergence than a state-of-the-art MCMC splice sampler. We further evaluate performances on Negative Binomial simulations and conduct sensitivity analyses to assess robustness on realistic data conditions. Application to a publicly available leukemia transcriptomic data set comprising 72 samples and 2,194 gene expression successfully recovers every known sub-type, all while identifying additional gene expression-based sub-clusters with meaningful biological interpretation.
Problem

Research questions and friction points this paper is trying to address.

Dirichlet Process Mixture Model
high-dimensional clustering
concentration parameter
variance estimation
adaptive clustering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dirichlet Process Mixture Model
Collapsed Variational Inference
Adaptive Clustering
Weakly-informative Priors
High-dimensional Covariance Structure
🔎 Similar Papers
No similar papers found.
A
Annesh Pal
Université de Bordeaux, INSERM, INRIA, Bordeaux Population Health, U1219, SISTM, 33000 Bordeaux, France
A
A. Mimoun
Centre Hospitalier Universitaire de Bordeaux, Laboratoire d’Hématologie, 33000 Bordeaux, France
Rodolphe Thiébaut
Rodolphe Thiébaut
Université de bordeaux
Médecinestatistique
B
B. Hejblum
Université de Bordeaux, INSERM, INRIA, Bordeaux Population Health, U1219, SISTM, 33000 Bordeaux, France