π€ AI Summary
This study addresses the challenge of modeling joint densities of multivariate responses in the presence of categorical covariates and identifying their effects. The authors propose a Bayesian semiparametric approach based on Gaussian copulas, wherein conditional marginal distributions are represented through a mixture model with shared atoms. To allow flexible covariate-dependent variation in mixture weights, they employ Tucker tensor decomposition, coupled with a coordinate-specific random partition model that adaptively aggregates covariate levels exhibiting similar effects. Embedded within this framework is a coordinate-level predictor selection mechanism that achieves effective dimension reduction without compromising modeling flexibility. Extensive simulations and an analysis of NHANES dietary data demonstrate that the method is computationally efficient, memory-friendly, and delivers superior performance in both estimation accuracy and predictive capability.
π Abstract
We propose a flexible Bayesian approach for estimating the joint density of a multivariate outcome of interest in the presence of categorical covariates. Leveraging a Gaussian copula framework, our method effectively captures the dependence structure across different coordinates of the multivariate response. The conditional (on covariates) marginal (across outcomes) distributions are modeled as flexible mixtures with shared atoms across coordinates, while the mixture weights are allowed to vary with covariates through a novel Tucker tensor factorization-based structure, which enables the identification of coordinate-specific subsets of influential covariates. In particular, we replace the traditional mode matrices with coordinate-specific random partition models on the covariate levels, offering a flexible mechanism to aggregate covariate levels that exhibit similar effects on the response. Additionally, to handle settings with many covariates, we introduce a Markov chain Monte Carlo algorithm that scales with the number of aggregated levels rather than the original levels, significantly reducing memory requirements and improving computational efficiency. We demonstrate the method's numerical performance through simulation experiments and its practical applicability through the analysis of NHANES dietary data.