🤖 AI Summary
This study addresses the challenge of automatically identifying and fine-grained classifying delusion-related content—including delusional beliefs, associated emotions, and behavioral responses—from audio diaries recorded in naturalistic settings. To this end, we propose an automated pipeline based on a multi-agent large language model framework that integrates three foundational models, enhanced with refined diagnostic prompts and a majority voting mechanism. This design effectively suppresses false positives related to delusional themes while preserving nuanced interpretations of emotional and behavioral expressions. Evaluated on delusion detection and multi-label classification tasks, our approach achieves Micro F1 scores of 0.872 and 0.779, respectively, demonstrating its efficacy, robustness, and scalability in real-world clinical applications.
📝 Abstract
Speech monologues recorded in naturalistic settings provide opportunities to characterize mental illness phenomenology and detect symptom exacerbation. Large language models (LLMs) offer new possibilities for automating this process, as they require annotated data primarily for evaluation rather than training. In this paper, we present a novel automated, multi-agent LLM pipeline for the fine-grained, multi-label extraction of language suggestive of delusional beliefs, associated affective responses, and behavioral responses from transcripts of naturalistic audio diaries collected from people with moderate persecutory ideation. Evaluating an ensemble of three foundation models, we demonstrate that detailed diagnostic prompt instructions successfully reduce false positives for delusional theme classification, but also constrain the interpretation of affective or behavioral responses. Furthermore, comparing multi-agent adjudication frameworks shows that complex conversational debate between agents diminishes accuracy on clinically ambiguous text by inducing premature consensus. Instead, majority voting establishes robust performance (Micro F1 of 0.872 and 0.779 for delusion detection and classification respectively). This work provides a validated and scalable pipeline for the automated detection and characterization of content suggesting delusional beliefs in naturalistic speech.