🤖 AI Summary
This study investigates differences in the manifestation and underlying causes of loneliness between caregivers and non-caregivers. To this end, we developed a large language model–driven analytical pipeline that integrates an expert-defined loneliness assessment framework with a causal taxonomy to annotate and compare Reddit posts. Innovatively combining domain expertise with advanced models such as GPT-4o and GPT-5—and incorporating human validation to ensure data quality—we constructed the first high-quality, interpretable dataset on loneliness derived from social media. Experimental results demonstrate loneliness detection accuracies of 76.09% for caregivers and 79.78% for non-caregivers, with micro-averaged F1 scores of 0.825 and 0.800 for cause classification, respectively. The analysis reveals that caregivers’ loneliness predominantly stems from caregiving burden, identity conflict, and feelings of abandonment.
📝 Abstract
This paper presents an LLM-driven approach for constructing diverse social media datasets to measure and compare loneliness in the caregiver and non-caregiver populations. We introduce an expert-developed loneliness evaluation framework and an expert-informed typology for categorizing causes of loneliness for analyzing social media text. Using a human-validated data processing pipeline, we apply GPT-4o, GPT-5-nano, and GPT-5 to build a high-quality Reddit corpus and analyze loneliness across both populations. The loneliness evaluation framework achieved average accuracies of 76.09% and 79.78% for caregivers and non-caregivers, respectively. The cause categorization framework achieved micro-aggregate F1 scores of 0.825 and 0.80 for caregivers and non-caregivers, respectively. Across populations, we observe substantial differences in the distribution of types of causes of loneliness. Caregivers' loneliness were predominantly linked to caregiving roles, identity recognition, and feelings of abandonment, indicating distinct loneliness experiences between the two groups. Demographic extraction further demonstrates the viability of Reddit for building a diverse caregiver loneliness dataset. Overall, this work establishes an LLM-based pipeline for creating high quality social media datasets for studying loneliness and demonstrates its effectiveness in analyzing population-level differences in the manifestation of loneliness.