ADVOSYNTH: A Synthetic Multi-Advocate Dataset for Speaker Identification in Courtroom Scenarios

πŸ“… 2026-01-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of speaker identity confusion in high-fidelity synthetic speech within structured scenarios such as courtroom settings, where distinguishing between speakers is critical yet difficult. To support targeted research, the authors introduce Advosynth-500, the first multi-role synthetic speech dataset specifically designed for courtroom debates. It comprises 100 utterances organized into five adversarial debate pairs, generated using the Speech Llama Omni model and featuring 10 distinct synthetic lawyer personas with well-defined acoustic characteristics. This dataset establishes a new benchmark for speaker identification tasks involving synthetic speech, enabling systematic evaluation of modern systems’ ability to discern the origins of highly realistic synthetic voices in forensic and legal contexts.

Technology Category

Application Category

πŸ“ Abstract
As large-scale speech-to-speech models achieve high fidelity, the distinction between synthetic voices in structured environments becomes a vital area of study. This paper introduces Advosynth-500, a specialized dataset comprising 100 synthetic speech files featuring 10 unique advocate identities. Using the Speech Llama Omni model, we simulate five distinct advocate pairs engaged in courtroom arguments. We define specific vocal characteristics for each advocate and present a speaker identification challenge to evaluate the ability of modern systems to map audio files to their respective synthetic origins. Dataset is available at this link-https: //github.com/naturenurtureelite/ADVOSYNTH-500.
Problem

Research questions and friction points this paper is trying to address.

speaker identification
synthetic speech
courtroom scenarios
voice identity
advocate
Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic speech
speaker identification
courtroom scenarios
Speech Llama Omni
multi-advocate dataset
πŸ”Ž Similar Papers
No similar papers found.