Detecting Stealth Sycophancy in Mental-Health Dialogue with Dynamic Emotional Signature Graphs

📅 2026-05-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

174K/year
🤖 AI Summary
This work addresses the challenge of reliably evaluating therapeutic quality in mental health dialogues, particularly the detection of stealth sycophancy—a subtle form of undue alignment by conversational agents. To this end, the authors propose the Dynamic Emotional State Graph (DESG), a model-agnostic offline evaluation framework that decouples clinical states and constructs an asymmetric clinical geometric space. Their approach introduces, for the first time, an asymmetric scoring mechanism grounded in clinical state manifolds and graph-based trajectory analysis, integrating graph neural networks with ensemble learning to enhance both interpretability and discriminative power. Evaluated on a stress-test set comprising 3,000 dialogue windows, the DESG-Ensemble variant achieves a macro F1-score of 0.9353 on 600 held-out samples, significantly outperforming existing methods—all without relying on large language models as evaluators.
📝 Abstract
As conversational AI therapists are increasingly used in psychological support settings, reliable offline evaluation of therapeutic response quality remains an open problem. This paper studies multi-domain support-dialogue evaluation without relying on large language models as final judges. We use a direct LLM judge as a baseline that reads raw dialogue text and predicts whether the target response is harmful, productive, or neutral. We find that direct LLM judges and symmetric text-similarity metrics are poorly aligned with therapeutic quality because the target label depends on clinical direction: whether the response moves the user state toward regulation or reframing, leaves it broadly unchanged, or reinforces deterioration through higher risk affect or cognitive-distortion mass. To address this issue, we propose Dynamic Emotional Signature Graphs (DESG), a model-agnostic evaluator that represents dialogue windows with decoupled clinical states and scores them using asymmetric clinical geometry. We evaluate DESG on a constructed diagnostic stress-test benchmark of 3{,}000 dialogue windows from EmpatheticDialogues, ESConv, and CRADLE-Dialogue, covering peer support, counseling dialogue, and crisis-oriented interaction. On the 600-window held-out test aggregate, DESG-Ensemble achieves 0.9353 macro-F1, exceeding ConcatANN by 1.51 percentage points, BERTScore by 19.63 points, and TRACT by 33.81 points. Feature ablations, artifact controls, a 100-window blinded adjudicator audit, and qualitative disagreement cases indicate that the clinical state manifold is the main discriminative substrate, while graph-based trajectory components provide asymmetric scoring and interpretable diagnostics rather than serving as the sole source of performance.
Problem

Research questions and friction points this paper is trying to address.

stealth sycophancy
mental-health dialogue
therapeutic response evaluation
offline evaluation
clinical direction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Emotional Signature Graphs
model-agnostic evaluation
asymmetric clinical geometry
therapeutic dialogue assessment
clinical state manifold