DS@GT at eRisk 2025: From prompts to predictions, benchmarking early depression detection with conversational agent based assessments and temporal attention models

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This study addresses the need for early depression screening by proposing a dialogue-based unsupervised assessment method that eliminates reliance on manual annotation. Methodologically, it introduces a prompt template explicitly aligned with the Beck Depression Inventory-II (BDI-II) clinical criteria to guide large language models (LLMs) in performing structured psychological assessments; JSON-formatted outputs, cross-model consistency analysis, and intra-response logical validation replace conventional annotation, while a temporal attention mechanism extracts salient conversational cues. The key contribution is the first clinical-aligned prompting framework grounded in multi-model collaborative verification, unifying interpretability and reliability in assessment. Evaluated on the DCHR benchmark, the system ranks second on the official leaderboard, achieving DCHR=0.50, ADODL=0.89, and ASHR=0.27.

Technology Category

Application Category

📝 Abstract

This Working Note summarizes the participation of the DS@GT team in two eRisk 2025 challenges. For the Pilot Task on conversational depression detection with large language-models (LLMs), we adopted a prompt-engineering strategy in which diverse LLMs conducted BDI-II-based assessments and produced structured JSON outputs. Because ground-truth labels were unavailable, we evaluated cross-model agreement and internal consistency. Our prompt design methodology aligned model outputs with BDI-II criteria and enabled the analysis of conversational cues that influenced the prediction of symptoms. Our best submission, second on the official leaderboard, achieved DCHR = 0.50, ADODL = 0.89, and ASHR = 0.27.

Problem

Research questions and friction points this paper is trying to address.

Early depression detection using conversational agent assessments

Benchmarking LLM-based prompt engineering for BDI-II symptom analysis

Evaluating cross-model agreement without ground-truth depression labels

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompt-engineering strategy with diverse LLMs

BDI-II-based assessments producing structured JSON

Temporal attention models analyzing conversational cues

🔎 Similar Papers

Detecting mental disorder on social media: a ChatGPT-augmented explainable approach