ADVOSYNTH: A Synthetic Multi-Advocate Dataset for Speaker Identification in Courtroom Scenarios

📅 2026-01-15

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This work addresses the challenge of speaker identity confusion in high-fidelity synthetic speech within structured scenarios such as courtroom settings, where distinguishing between speakers is critical yet difficult. To support targeted research, the authors introduce Advosynth-500, the first multi-role synthetic speech dataset specifically designed for courtroom debates. It comprises 100 utterances organized into five adversarial debate pairs, generated using the Speech Llama Omni model and featuring 10 distinct synthetic lawyer personas with well-defined acoustic characteristics. This dataset establishes a new benchmark for speaker identification tasks involving synthetic speech, enabling systematic evaluation of modern systems’ ability to discern the origins of highly realistic synthetic voices in forensic and legal contexts.

Technology Category

Application Category

📝 Abstract

As large-scale speech-to-speech models achieve high fidelity, the distinction between synthetic voices in structured environments becomes a vital area of study. This paper introduces Advosynth-500, a specialized dataset comprising 100 synthetic speech files featuring 10 unique advocate identities. Using the Speech Llama Omni model, we simulate five distinct advocate pairs engaged in courtroom arguments. We define specific vocal characteristics for each advocate and present a speaker identification challenge to evaluate the ability of modern systems to map audio files to their respective synthetic origins. Dataset is available at this link-https: //github.com/naturenurtureelite/ADVOSYNTH-500.

Problem

Research questions and friction points this paper is trying to address.

speaker identification

synthetic speech

courtroom scenarios

voice identity

advocate

Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic speech

speaker identification

courtroom scenarios

Speech Llama Omni

multi-advocate dataset

🔎 Similar Papers

A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection

2024-09-23arXiv.orgCitations: 1