Why Are We Lonely? Leveraging LLMs to Measure and Understand Loneliness in Caregivers and Non-caregivers

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates differences in the manifestation and underlying causes of loneliness between caregivers and non-caregivers. To this end, we developed a large language model–driven analytical pipeline that integrates an expert-defined loneliness assessment framework with a causal taxonomy to annotate and compare Reddit posts. Innovatively combining domain expertise with advanced models such as GPT-4o and GPT-5—and incorporating human validation to ensure data quality—we constructed the first high-quality, interpretable dataset on loneliness derived from social media. Experimental results demonstrate loneliness detection accuracies of 76.09% for caregivers and 79.78% for non-caregivers, with micro-averaged F1 scores of 0.825 and 0.800 for cause classification, respectively. The analysis reveals that caregivers’ loneliness predominantly stems from caregiving burden, identity conflict, and feelings of abandonment.
📝 Abstract
This paper presents an LLM-driven approach for constructing diverse social media datasets to measure and compare loneliness in the caregiver and non-caregiver populations. We introduce an expert-developed loneliness evaluation framework and an expert-informed typology for categorizing causes of loneliness for analyzing social media text. Using a human-validated data processing pipeline, we apply GPT-4o, GPT-5-nano, and GPT-5 to build a high-quality Reddit corpus and analyze loneliness across both populations. The loneliness evaluation framework achieved average accuracies of 76.09% and 79.78% for caregivers and non-caregivers, respectively. The cause categorization framework achieved micro-aggregate F1 scores of 0.825 and 0.80 for caregivers and non-caregivers, respectively. Across populations, we observe substantial differences in the distribution of types of causes of loneliness. Caregivers' loneliness were predominantly linked to caregiving roles, identity recognition, and feelings of abandonment, indicating distinct loneliness experiences between the two groups. Demographic extraction further demonstrates the viability of Reddit for building a diverse caregiver loneliness dataset. Overall, this work establishes an LLM-based pipeline for creating high quality social media datasets for studying loneliness and demonstrates its effectiveness in analyzing population-level differences in the manifestation of loneliness.
Problem

Research questions and friction points this paper is trying to address.

loneliness
caregivers
social media
population differences
cause categorization
Innovation

Methods, ideas, or system contributions that make the work stand out.

large language models
loneliness measurement
social media analysis
caregiver mental health
expert-informed typology
🔎 Similar Papers
No similar papers found.
M
Michelle Damin Kim
Department of Computer Science, Emory University, Atlanta, GA, USA
Ellie S. Paek
Ellie S. Paek
Research Scientist, Emory University
Y
Yufen Lin
Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, GA, USA
E
Emily Mroz
Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, GA, USA
J
Jane Chung
Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, GA, USA
Jinho D. Choi
Jinho D. Choi
Associate Professor, Emory University
Natural Language ProcessingComputational LinguisticsConversational AI