π€ AI Summary
This study addresses the limitations of manual psychological crisis assessment in hotline services, which suffers from high subjectivity and low efficiency, hindering accurate identification of callersβ crisis severity. To overcome these challenges, the authors propose an automated crisis-level classification framework based on large language models that innovatively integrates nonverbal emotional cues from speech with interpretable diagnostic reasoning chains. Specifically, prosodic information is injected to embed acoustic affective features into textual representations, and a reasoning-augmented training strategy is designed to enable the model to generate structured inference processes alongside classification decisions. Experimental results demonstrate that the proposed method achieves a macro F1-score of 0.802 and an accuracy of 0.805 on a three-class crisis-level task, significantly outperforming existing automated approaches and thereby enhancing both the quality and scalability of psychological hotline assessments.
π Abstract
Psychological support hotlines provide critical support for individuals experiencing mental health emergencies, yet current assessments largely rely on human operators whose judgments may vary with professional experience and are constrained by limited staffing resources. This paper proposes a large language model (LLM)-based framework for automated crisis level classification, a key indicator that supports many downstream tasks and improves the overall quality of hotline services. To better capture emotional signals in spoken conversations, we introduce a paralinguistic injection method that inserts identified non-verbal emotional cues into speech transcripts, enabling LLM-based reasoning to incorporate critical acoustic nuances. In addition, we propose a reasoning-enhanced training strategy that trains the model to generate diagnostic reasoning chains as an auxiliary task, which serves as a regulariser to improve classification performance. Combined with data augmentation, our final system achieves a macro F1-score of 0.802 and an accuracy of 0.805 on the three-class classification task under 5-fold cross-validation.