Covariate-assisted Grade of Membership Models via Shared Latent Geometry

📅 2026-01-24

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This study addresses the integration of multivariate categorical responses with auxiliary covariates by proposing a likelihood-free, covariate-assisted strategy that avoids joint modeling. The approach unifies both sources of information through a shared low-rank simplex geometric structure, employing likelihood-free spectral estimation and heteroscedastic principal component analysis to handle high-dimensional noise. A balancing parameter is introduced to modulate the contributions of multiple data sources. Theoretically, the work establishes weaker identifiability conditions and, for the first time in high-dimensional settings, demonstrates that covariates can accelerate convergence rates and yield element-wise error bounds. Empirically, the method substantially improves computational efficiency, statistical accuracy, and interpretability in both simulated data and real-world educational assessment applications.

Technology Category

Application Category

📝 Abstract

The grade of membership model is a flexible latent variable model for analyzing multivariate categorical data through individual-level mixed membership scores. In many modern applications, auxiliary covariates are collected alongside responses and encode information about the same latent structure. Traditional approaches to incorporating such covariates typically rely on fully specified joint likelihoods, which are computationally intensive and sensitive to misspecification. We introduce a covariate-assisted grade of membership model that integrates response and covariate information by exploiting their shared low-rank simplex geometry, rather than modeling their joint distribution. We propose a likelihood-free spectral estimation procedure that combines heterogeneous data sources through a balance parameter controlling their relative contribution. To accommodate high-dimensional and heteroskedastic noise, we employ heteroskedastic principal component analysis before performing simplex-based geometric recovery. Our theoretical analysis establishes weaker identifiability conditions than those required in the covariate-free model, and further derives finite-sample, entrywise error bounds for both mixed membership scores and item parameters. These results demonstrate that auxiliary covariates can provably improve latent structure recovery, yielding faster convergence rates in high-dimensional regimes. Simulation studies and an application to educational assessment data illustrate the computational efficiency, statistical accuracy, and interpretability gains of the proposed method. The code for reproducing these results is open-source and available at \texttt{https://github.com/Toby-X/Covariate-Assisted-GoM}

Problem

Research questions and friction points this paper is trying to address.

grade of membership

covariate-assisted

latent structure

shared geometry

mixed membership

Innovation

Methods, ideas, or system contributions that make the work stand out.

covariate-assisted modeling

shared latent geometry

likelihood-free spectral estimation