🤖 AI Summary
This study addresses the integration of multivariate categorical responses with auxiliary covariates by proposing a likelihood-free, covariate-assisted strategy that avoids joint modeling. The approach unifies both sources of information through a shared low-rank simplex geometric structure, employing likelihood-free spectral estimation and heteroscedastic principal component analysis to handle high-dimensional noise. A balancing parameter is introduced to modulate the contributions of multiple data sources. Theoretically, the work establishes weaker identifiability conditions and, for the first time in high-dimensional settings, demonstrates that covariates can accelerate convergence rates and yield element-wise error bounds. Empirically, the method substantially improves computational efficiency, statistical accuracy, and interpretability in both simulated data and real-world educational assessment applications.
📝 Abstract
The grade of membership model is a flexible latent variable model for analyzing multivariate categorical data through individual-level mixed membership scores. In many modern applications, auxiliary covariates are collected alongside responses and encode information about the same latent structure. Traditional approaches to incorporating such covariates typically rely on fully specified joint likelihoods, which are computationally intensive and sensitive to misspecification. We introduce a covariate-assisted grade of membership model that integrates response and covariate information by exploiting their shared low-rank simplex geometry, rather than modeling their joint distribution. We propose a likelihood-free spectral estimation procedure that combines heterogeneous data sources through a balance parameter controlling their relative contribution. To accommodate high-dimensional and heteroskedastic noise, we employ heteroskedastic principal component analysis before performing simplex-based geometric recovery. Our theoretical analysis establishes weaker identifiability conditions than those required in the covariate-free model, and further derives finite-sample, entrywise error bounds for both mixed membership scores and item parameters. These results demonstrate that auxiliary covariates can provably improve latent structure recovery, yielding faster convergence rates in high-dimensional regimes. Simulation studies and an application to educational assessment data illustrate the computational efficiency, statistical accuracy, and interpretability gains of the proposed method. The code for reproducing these results is open-source and available at \texttt{https://github.com/Toby-X/Covariate-Assisted-GoM}