PAIR-SAFE: A Paired-Agent Approach for Runtime Auditing and Refining AI-Mediated Mental Health Support

📅 2026-01-19

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Large language models (LLMs) deployed in mental health support often generate responses that are overly directive, clinically inconsistent, or high-risk, while lacking runtime transparency and accountability. To address this, this work proposes a dual-agent framework wherein a Responder generates conversational replies and a Judge—guided by the clinically validated MITI-4 coding system—performs real-time evaluation to decide whether to ALLOW or REVISE each response. This approach embeds MITI-4 criteria directly into the runtime architecture, enabling interpretable and intervenable dynamic supervision. Empirical evaluation demonstrates significant improvements in dialogue quality across dimensions such as collaboration, evocative questioning, and relational rapport. Expert qualitative assessment further corroborates the efficacy of this runtime clinical oversight mechanism, marking the first integration of MITI-4 into an operational LLM safety framework.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are increasingly used for mental health support, yet they can produce responses that are overly directive, inconsistent, or clinically misaligned, particularly in sensitive or high-risk contexts. Existing approaches to mitigating these risks largely rely on implicit alignment through training or prompting, offering limited transparency and runtime accountability. We introduce PAIR-SAFE, a paired-agent framework for auditing and refining AI-generated mental health support that integrates a Responder agent with a supervisory Judge agent grounded in the clinically validated Motivational Interviewing Treatment Integrity (MITI-4) framework. The Judgeaudits each response and provides structuredALLOW or REVISE decisions that guide runtime response refinement. We simulate counseling interactions using a support-seeker simulator derived from human-annotated motivational interviewing data. We find that Judge-supervised interactions show significant improvements in key MITI dimensions, including Partnership, Seek Collaboration, and overall Relational quality. Our quantitative findings are supported by qualitative expert evaluation, which further highlights the nuances of runtime supervision. Together, our results reveal that such pairedagent approach can provide clinically grounded auditing and refinement for AI-assisted conversational mental health support.

Problem

Research questions and friction points this paper is trying to address.

mental health support

large language models

clinical alignment

runtime auditing

AI safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

paired-agent

runtime auditing

Motivational Interviewing