Why Synthetic Isn't Real Yet: A Diagnostic Framework for Contact Center Dialogue Generation

๐Ÿ“… 2025-08-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Customer service dialogues are hindered by privacy constraints and scarcity of authentic corpora, impeding high-quality synthetic data generation and evaluation. To address their goal-oriented nature, role asymmetry, ASR-induced noise, and regulatory compliance requirements, we propose a multi-stage, feature-aware synthesis method grounded in intent summarization, topic flow modeling, and QA-formatted structuring. Furthermore, we introduce the first multilingual diagnostic framework tailored to customer service scenarios, comprising 18 linguistically and behaviorally grounded metrics. This framework enables fine-grained, reference-free assessment of disfluency, emotional consistency, and agent behavioral realismโ€”marking the first such capability. Empirical analysis reveals systematic biases across existing generation methods: none fully approximate real-world dialogues, with pronounced deficiencies in emotional and behavioral modeling. Our diagnostic suite provides an interpretable, reproducible evaluation pathway for advancing synthetic dialogue systems.

Technology Category

Application Category

๐Ÿ“ Abstract
Synthetic transcript generation is critical in contact center domains, where privacy and data scarcity limit model training and evaluation. Unlike prior synthetic dialogue generation work on open-domain or medical dialogues, contact center conversations are goal-oriented, role-asymmetric, and behaviorally complex, featuring disfluencies, ASR noise, and compliance-driven agent actions. In deployments where transcripts are unavailable, standard pipelines still yield derived call attributes such as Intent Summaries, Topic Flow, and QA Evaluation Forms. We leverage these as supervision signals to guide generation. To assess the quality of such outputs, we introduce a diagnostic framework of 18 linguistically and behaviorally grounded metrics for comparing real and synthetic transcripts. We benchmark four language-agnostic generation strategies, from simple prompting to characteristic-aware multi-stage approaches, alongside reference-free baselines. Results reveal persistent challenges: no method excels across all traits, with notable deficits in disfluency, sentiment, and behavioral realism. Our diagnostic tool exposes these gaps, enabling fine-grained evaluation and stress testing of synthetic dialogue across languages.
Problem

Research questions and friction points this paper is trying to address.

Generating realistic synthetic contact center dialogues for training
Addressing data scarcity and privacy constraints in dialogue generation
Evaluating synthetic dialogue quality with linguistically-grounded metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging derived call attributes as supervision signals
Introducing a diagnostic framework with 18 metrics
Benchmarking language-agnostic multi-stage generation strategies
๐Ÿ”Ž Similar Papers
No similar papers found.
R
Rishikesh Devanathan
Observe.AI Bangalore, India
V
Varun Nathan
Observe.AI Bangalore, India
Ayush Kumar
Ayush Kumar
University of Manitoba
Multidrug Resistance in Gram negative bacteria