From Reddit to Generative AI: Evaluating Large Language Models for Anxiety Support Fine-tuned on Social Media Data

📅 2025-05-24
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
The suitability and risks of large language models (LLMs) in anxiety support contexts remain poorly understood. Method: We systematically evaluated GPT-4 and Llama-2/3 using real r/Anxiety subreddit posts, applying prompt engineering and supervised fine-tuning, and introduced a multidimensional, interpretable evaluation framework assessing linguistic quality, safety (toxicity/bias), and supportive capacity (empathic expression, supportive discourse). Contribution/Results: We first demonstrate that fine-tuning on raw social media data improves fluency (+12%) but significantly degrades empathic responsiveness (−41% in empathic expression) and increases toxicity (+27%). GPT-series models consistently outperform Llama models in supportive capability. Based on these findings, we propose a dual-path optimization paradigm—“data purification + alignment constraints”—to enhance safety and trustworthiness. This work provides both methodological guidance and empirical evidence for the responsible deployment of LLMs in mental health applications.

Technology Category

Application Category

📝 Abstract
The growing demand for accessible mental health support, compounded by workforce shortages and logistical barriers, has led to increased interest in utilizing Large Language Models (LLMs) for scalable and real-time assistance. However, their use in sensitive domains such as anxiety support remains underexamined. This study presents a systematic evaluation of LLMs (GPT and Llama) for their potential utility in anxiety support by using real user-generated posts from the r/Anxiety subreddit for both prompting and fine-tuning. Our approach utilizes a mixed-method evaluation framework incorporating three main categories of criteria: (i) linguistic quality, (ii) safety and trustworthiness, and (iii) supportiveness. Results show that fine-tuning LLMs with naturalistic anxiety-related data enhanced linguistic quality but increased toxicity and bias, and diminished emotional responsiveness. While LLMs exhibited limited empathy, GPT was evaluated as more supportive overall. Our findings highlight the risks of fine-tuning LLMs on unprocessed social media content without mitigation strategies.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs for anxiety support using social media data
Assessing linguistic quality, safety, and supportiveness of LLMs
Risks of fine-tuning LLMs on unprocessed social media content
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning LLMs with social media data
Mixed-method evaluation framework
Assessing linguistic quality and safety
🔎 Similar Papers
No similar papers found.