🤖 AI Summary
Latent class analysis (LCA) often suffers from limited interpretability due to the complexity of item response probability matrices. This work proposes a post-estimation sparsification method that enhances traditional LCA by incorporating item-wise pseudolikelihood with sparse regularization, which penalizes the number of non-zero response probability levels per item. This approach automatically merges redundant response levels, thereby improving the interpretability of latent classes. The method is computationally efficient and enjoys theoretical consistency guarantees for accurately recovering the true sparse structure. Combined with Bayesian Information Criterion (BIC) for selecting the number of latent classes, the proposed framework demonstrates strong performance in both simulation studies and an empirical analysis of social role performance survey data, yielding concise and interpretable latent class characterizations. The implementation code is publicly available.
📝 Abstract
Latent Class Analysis (LCA) is widely used to identify unobserved subgroups in social and behavioural sciences. A long-standing challenge for LCA is the interpretability of the latent classes, due to the high complexity of the estimated item response probability matrix. To address this, we propose a computationally efficient post-estimation refinement procedure that enhances model interpretability by a sparse model estimate. The method begins by estimating a classical, unrestricted, latent class model and determining the number of classes using the Bayesian information criterion (BIC). It is followed by a refinement step that further performs model selection on the item-specific response probabilities based on the initial estimate. This refinement penalises the number of distinct response probability levels per item, collapsing redundant levels to yield a sparse matrix that is significantly easier to interpret than those produced by classical LCA. We provide asymptotic theory showing that the proposed procedure consistently recovers the sparse pattern of the item response probabilities for each item, and further validate its performance through extensive simulations. The practical power of the proposed method is further illustrated via an application to survey data on social role performance, where it provides a parsimonious and clear characterisation of the resulting latent classes. The code for implementing the proposed method is publicly available at https://github.com/florence07/Sparse-LCA-Refinement.