Large Language Models for Mental Health: A Multilingual Evaluation

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This study addresses the lack of systematic evaluation of multilingual large language models (LLMs) in mental health tasks, particularly regarding their cross-lingual generalization capabilities on non-English languages and machine-translated data. We present the first comprehensive assessment of multiple open- and closed-source LLMs across eight languages and their machine-translated versions, evaluating performance under zero-shot, few-shot, and fine-tuned settings against traditional NLP baselines. Our analysis further investigates the interplay between translation quality, language typology, and model performance. Results show that fine-tuned open-source models achieve competitive or even state-of-the-art F1 scores across several datasets, on par with closed-source counterparts. However, performance consistently degrades when using machine-translated data, with the extent of degradation varying significantly across languages.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have remarkable capabilities across NLP tasks. However, their performance in multilingual contexts, especially within the mental health domain, has not been thoroughly explored. In this paper, we evaluate proprietary and open-source LLMs on eight mental health datasets in various languages, as well as their machine-translated (MT) counterparts. We compare LLM performance in zero-shot, few-shot, and fine-tuned settings against conventional NLP baselines that do not employ LLMs. In addition, we assess translation quality across language families and typologies to understand its influence on LLM performance. Proprietary LLMs and fine-tuned open-source LLMs achieve competitive F1 scores on several datasets, often surpassing state-of-the-art results. However, performance on MT data is generally lower, and the extent of this decline varies by language and typology. This variation highlights both the strengths of LLMs in handling mental health tasks in languages other than English and their limitations when translation quality introduces structural or lexical mismatches.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Mental Health

Multilingual Evaluation

Machine Translation

Natural Language Processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Multilingual Evaluation

Mental Health NLP