How Real Are Synthetic Therapy Conversations? Evaluating Fidelity in Prolonged Exposure Dialogues

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This study systematically evaluates the clinical fidelity of synthetic PTSD Prolonged Exposure (PE) therapy dialogues to determine their suitability as substitutes for real clinical data in model training and evaluation. Method: We propose the first PE-specific fidelity assessment framework, integrating linguistic profiling, dialogue structure modeling, protocol compliance checking, and semantic similarity analysis—introducing protocol-sensitive linguistic and semantic metrics that transcend conventional fluency-oriented evaluation. Contribution/Results: Experiments show synthetic dialogues replicate basic structural properties (e.g., turn-taking ratio: 0.98 vs. 0.99 in real data) but critically underperform on core clinical dimensions—including dynamic distress monitoring and phase-appropriate therapeutic alignment—exposing fundamental blind spots in current synthetic-data evaluation paradigms. Our work establishes a reusable methodological benchmark and identifies concrete directions for improving clinical dialogue generation and fidelity assessment.

Technology Category

Application Category

📝 Abstract

The growing adoption of synthetic data in healthcare is driven by privacy concerns, limited access to real-world data, and the high cost of annotation. This work explores the use of synthetic Prolonged Exposure (PE) therapeutic conversations for Post-Traumatic Stress Disorder (PTSD) as a scalable alternative for training and evaluating clinical models. We systematically compare real and synthetic dialogues using linguistic, structural, and protocol-specific metrics, including turn-taking patterns and treatment fidelity. We also introduce and evaluate PE-specific metrics derived from linguistic analysis and semantic modeling, offering a novel framework for assessing clinical fidelity beyond surface fluency. Our findings show that although synthetic data holds promise for mitigating data scarcity and protecting patient privacy, it can struggle to capture the subtle dynamics of therapeutic interactions. In our dataset, synthetic dialogues match structural features of real-world dialogues (e.g., speaker switch ratio: 0.98 vs. 0.99), however, synthetic interactions do not adequately reflect key fidelity markers (e.g., distress monitoring). We highlight gaps in existing evaluation frameworks and advocate for fidelity-aware metrics that go beyond surface fluency to uncover clinically significant failures. Our findings clarify where synthetic data can effectively complement real-world datasets -- and where critical limitations remain.

Problem

Research questions and friction points this paper is trying to address.

Evaluating fidelity of synthetic PE therapy dialogues for PTSD

Comparing real and synthetic dialogues using linguistic and clinical metrics

Identifying gaps in synthetic data's ability to capture therapeutic dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic PE dialogues for PTSD model training

Linguistic and semantic metrics for clinical fidelity

Fidelity-aware evaluation beyond surface fluency

🔎 Similar Papers

No similar papers found.