PsihoRo: Depression and Anxiety Romanian Text Corpus

📅 2026-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the scarcity of open-source mental health text corpora in Romanian, which has hindered natural language processing (NLP) research on depression and anxiety. To bridge this gap, the authors collected textual responses from 205 participants using a questionnaire comprising six open-ended questions, alongside standardized PHQ-9 and GAD-7 screening scales, to construct and publicly release PsihoRo—the first Romanian mental health corpus. Through a combination of statistical analysis, a Romanian-adapted LIWC dictionary, emotion detection, and topic modeling, the study systematically identifies linguistic markers significantly associated with depression and anxiety. This work not only fills a critical resource void in psycholinguistic NLP for Romanian but also establishes a foundational dataset for future research in mental health text analysis.

Technology Category

Application Category

📝 Abstract
Psychological corpora in NLP are collections of texts used to analyze human psychology, emotions, and mental health. These texts allow researchers to study psychological constructs, detect mental health issues and analyze emotional language. However, mental health data can be difficult to collect correctly from social media, due to suppositions made by the collectors. A more pragmatic strategy involves gathering data through open-ended questions and then assessing this information with self-report screening surveys. This method was employed successfully for English, a language with a lot of psychological NLP resources. However, this cannot be stated for Romanian, which currently has no open-source mental health corpus. To address this gap, we have created the first corpus for depression and anxiety in Romanian, by utilizing a form with 6 open-ended questions along with the standardized PHQ-9 and GAD-7 screening questionnaires. Consisting of the texts of 205 respondents and although it may seem small, PsihoRo is a first step towards understanding and analyzing texts regarding the mental health of the Romanian population. We employ statistical analysis, text analysis using Romanian LIWC, emotion detection and topic modeling to show what are the most important features of this newly introduced resource to the NLP community.
Problem

Research questions and friction points this paper is trying to address.

mental health corpus
Romanian language
depression
anxiety
NLP
Innovation

Methods, ideas, or system contributions that make the work stand out.

Romanian mental health corpus
PHQ-9 and GAD-7 integration
open-ended question methodology
psychological NLP resource
emotion detection in Romanian
🔎 Similar Papers
No similar papers found.