🤖 AI Summary
This study addresses the challenges of identifying latent stigma and scaling its assessment in HIV clinical practice. We propose a hybrid domain-knowledge–informed and data-driven topic modeling approach. Applied to 9,140 de-identified clinical notes from people living with HIV, the method integrates seed-word snowball expansion, multi-strategy keyword filtering, and expert-in-the-loop curation, coupled with LDA topic modeling, term-frequency analysis, and subgroup-specific topic variation testing. It systematically uncovers five core themes—including “Mental Health and Stigma” and “Social Support and Engagement.” Results reveal statistically significant thematic differences across age subgroups, demonstrating the method’s high sensitivity, interpretability, and scalability for detecting latent stigma signals in real-world clinical settings. To our knowledge, this constitutes the first reproducible NLP framework enabling automated, timely, and clinically actionable stigma assessment in HIV care.
📝 Abstract
Objective: To characterize stigma dimensions, social, and related behavioral circumstances in people living with HIV (PLWHs) seeking care, using natural language processing methods applied to a large collection of electronic health record (EHR) clinical notes from a large integrated health system in the southeast United States. Methods: We identified 9,140 cohort of PLWHs from the UF Health IDR and performed topic modeling analysis using Latent Dirichlet Allocation (LDA) to uncover stigma dimensions, social, and related behavioral circumstances. Domain experts created a seed list of HIV-related stigma keywords, then applied a snowball strategy to iteratively review notes for additional terms until saturation was reached. To identify more target topics, we tested three keyword-based filtering strategies. Domain experts manually reviewed the detected topics using the prevalent terms and key discussion topics. Word frequency analysis was used to highlight the prevalent terms associated with each topic. In addition, we conducted topic variation analysis among subgroups to examine differences across age and sex-specific demographics. Results and Conclusion: Topic modeling on sentences containing at least one keyword uncovered a wide range of topic themes associated with HIV-related stigma, social, and related behaviors circumstances, including"Mental Health Concern and Stigma","Social Support and Engagement","Limited Healthcare Access and Severe Illness","Treatment Refusal and Isolation"and so on. Topic variation analysis across age subgroups revealed differences. Extracting and understanding the HIV-related stigma dimensions, social, and related behavioral circumstances from EHR clinical notes enables scalable, time-efficient assessment, overcoming the limitations of traditional questionnaires and improving patient outcomes.