🤖 AI Summary
Accurately identifying depression and anxiety from heterogeneous, real-world textual data—characterized by noise, brevity, limited samples, and genre heterogeneity—remains challenging for conventional NLP models. Method: This study conducts a systematic comparative evaluation of traditional machine learning, psycholinguistically informed lightweight encoder models (e.g., BERT variants), and large language models (LLMs) across five diverse clinical and unstructured text datasets. Contribution/Results: LLMs achieve substantial gains (8–15% absolute accuracy improvement) in low-resource, noisy, and genre-mixed settings, demonstrating robustness under realistic constraints. Conversely, on high-quality, clinically validated texts, lightweight encoders leveraging psycholinguistic features attain 92.3% F1—nearly matching the best LLM (93.1%)—while offering superior computational efficiency and model interpretability. This work is the first to empirically delineate the contextual boundaries of LLM superiority in mental health text classification and to validate the competitive performance of interpretable, feature-driven models on critical clinical subsets.
📝 Abstract
This paper compares the effectiveness of traditional machine learning methods, encoder-based models, and large language models (LLMs) on the task of detecting depression and anxiety. Five datasets were considered, each differing in format and the method used to define the target pathology class. We tested AutoML models based on linguistic features, several variations of encoder-based Transformers such as BERT, and state-of-the-art LLMs as pathology classification models. The results demonstrated that LLMs outperform traditional methods, particularly on noisy and small datasets where training examples vary significantly in text length and genre. However, psycholinguistic features and encoder-based models can achieve performance comparable to language models when trained on texts from individuals with clinically confirmed depression, highlighting their potential effectiveness in targeted clinical applications.