CounselReflect: A Toolkit for Auditing Mental-Health Dialogues

📅 2026-03-31

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the current lack of effective, structured, and transparent auditing methods for conversational mental health support systems. To bridge this gap, the authors propose an end-to-end auditing toolkit that uniquely integrates model-driven metrics with a rubric-based evaluation framework, supporting both literature-derived and user-defined criteria. The system leverages task-specific models to generate 12 core metrics and employs configurable LLM-based evaluators to implement 69 literature-grounded indicators, producing interpretable reports that include session summaries, turn-level scores, and evidence excerpts. Offered through three deployment interfaces—a web application, a browser extension, and a command-line interface—the toolkit demonstrates strong performance in comprehensibility, usability, and trustworthiness, as validated by 20 users and 6 domain experts. The codebase and a demonstration video are publicly released.

Technology Category

Application Category

📝 Abstract

Mental-health support is increasingly mediated by conversational systems (e.g., LLM-based tools), but users often lack structured ways to audit the quality and potential risks of the support they receive. We introduce CounselReflect, an end-to-end toolkit for auditing mental-health support dialogues. Rather than producing a single opaque quality score, CounselReflect provides structured, multi-dimensional reports with session-level summaries, turn-level scores, and evidence-linked excerpts to support transparent inspection. The system integrates two families of evaluation signals: (i) 12 model-based metrics produced by task-specific predictors, and (ii) rubric-based metrics that extend coverage via a literature-derived library (69 metrics) and user-defined custom metrics, operationalized with configurable LLM judges. CounselReflect is available as a web application, browser extension, and command-line interface (CLI), enabling use in real-time settings as well as at scale. Human evaluation includes a user study with 20 participants and an expert review with 6 mental-health professionals, suggesting that CounselReflect supports understandable, usable, and trustworthy auditing. A demo video and full source code are also provided.

Problem

Research questions and friction points this paper is trying to address.

mental-health dialogues

auditing

conversational systems

quality assessment

risk evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

mental-health dialogue auditing

structured evaluation

multi-dimensional reporting

configurable LLM judges

transparent AI assessment

🔎 Similar Papers

FAIIR: Building Toward A Conversational AI Agent Assistant for Youth Mental Health Service Provision

2024-05-28Citations: 1

COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling

2024-02-22arXiv.orgCitations: 1