Building Multilingual Datasets for Predicting Mental Health Severity through LLMs: Prospects and Challenges

📅 2024-09-25

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing research lacks systematic evaluation of large language models’ (LLMs) efficacy and safety in non-English mental health support. To address this, we construct the first multilingual dataset for mental health severity prediction—covering Greek, Turkish, French, Portuguese, German, and Finnish—using human-verified translations and error attribution analysis. We conduct zero-shot and few-shot cross-lingual evaluations of GPT and Llama series models. Our contributions are threefold: (1) We empirically reveal significant performance disparities across the six languages and correlate them with linguistic properties and training data coverage; (2) We identify high-risk misdiagnosis scenarios and propose a clinically safe, cost-effective multilingual adaptation framework; (3) We achieve over 60% reduction in deployment costs. This work establishes a reproducible benchmark and actionable pathway for deploying multilingual AI in mental health applications.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly being integrated into various medical fields, including mental health support systems. However, there is a gap in research regarding the effectiveness of LLMs in non-English mental health support applications. To address this problem, we present a novel multilingual adaptation of widely-used mental health datasets, translated from English into six languages (e.g., Greek, Turkish, French, Portuguese, German, and Finnish). This dataset enables a comprehensive evaluation of LLM performance in detecting mental health conditions and assessing their severity across multiple languages. By experimenting with GPT and Llama, we observe considerable variability in performance across languages, despite being evaluated on the same translated dataset. This inconsistency underscores the complexities inherent in multilingual mental health support, where language-specific nuances and mental health data coverage can affect the accuracy of the models. Through comprehensive error analysis, we emphasize the risks of relying exclusively on LLMs in medical settings (e.g., their potential to contribute to misdiagnoses). Moreover, our proposed approach offers significant cost savings for multilingual tasks, presenting a major advantage for broad-scale implementation.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM effectiveness in non-English mental health support applications

Addressing performance variability in multilingual mental health condition detection

Highlighting risks of LLM misdiagnoses in multilingual medical settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual adaptation of mental health datasets

Evaluation of GPT and Llama across languages

Cost-effective approach for multilingual tasks

🔎 Similar Papers

No similar papers found.