Bayesian Variational Inference for Mixed Data Mixture Models

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the trade-off between inadequate uncertainty quantification and low computational efficiency in modeling mixed-type (continuous + categorical) data, this paper proposes the first Bayesian mixture modeling framework based on coordinate-ascent variational inference (CAVI). The method employs latent variables to capture data heterogeneity and complex inter-variable dependencies, and is the first to systematically apply variational inference to Bayesian modeling of mixed data. It ensures asymptotic consistency of posterior means while substantially reducing computational overhead compared to MCMC. Theoretical analysis provides convergence guarantees. Experiments on simulated datasets and the NHANES real-world dataset demonstrate that the proposed approach achieves both high accuracy—delivering comprehensive uncertainty quantification—and high efficiency—reducing computation time by one to two orders of magnitude relative to state-of-the-art methods—making it suitable for large-scale mixed-data applications.

Technology Category

Application Category

📝 Abstract
Heterogeneous, mixed type datasets including both continuous and categorical variables are ubiquitous, and enriches data analysis by allowing for more complex relationships and interactions to be modelled. Mixture models offer a flexible framework for capturing the underlying heterogeneity and relationships in mixed type datasets. Most current approaches for modelling mixed data either forgo uncertainty quantification and only conduct point estimation, and some use MCMC which incurs a very high computational cost that is not scalable to large datasets. This paper develops a coordinate ascent variational inference algorithm (CAVI) for mixture models on mixed (continuous and categorical) data, which circumvents the high computational cost of MCMC while retaining uncertainty quantification. We demonstrate our approach through simulation studies as well as an applied case study of the NHANES risk factor dataset. In addition, we show that the posterior means from CAVI for this model converge to the true parameter value as the sample size n tends to infinity, providing theoretical justification for our method.
Problem

Research questions and friction points this paper is trying to address.

Develops variational inference for mixed data mixtures
Avoids costly MCMC while quantifying uncertainty
Validates method with simulations and NHANES dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian variational inference for mixed data
Coordinate ascent variational inference algorithm
Uncertainty quantification with low computational cost
🔎 Similar Papers
No similar papers found.
J
Junyang Wang
Department of Mathematics, Imperial College London, London, United Kingdom.
J
James Bennett
School of Public Health, Imperial College London, London, United Kingdom.
V
Victor Lhoste
School of Public Health, Imperial College London, London, United Kingdom.
Sarah Filippi
Sarah Filippi
Reader, Imperial College London
Computational StatisticsStatistical Machine LearningApplication to biomedical problems.