π€ AI Summary
This work addresses the challenge of speaker identity confusion in high-fidelity synthetic speech within structured scenarios such as courtroom settings, where distinguishing between speakers is critical yet difficult. To support targeted research, the authors introduce Advosynth-500, the first multi-role synthetic speech dataset specifically designed for courtroom debates. It comprises 100 utterances organized into five adversarial debate pairs, generated using the Speech Llama Omni model and featuring 10 distinct synthetic lawyer personas with well-defined acoustic characteristics. This dataset establishes a new benchmark for speaker identification tasks involving synthetic speech, enabling systematic evaluation of modern systemsβ ability to discern the origins of highly realistic synthetic voices in forensic and legal contexts.
π Abstract
As large-scale speech-to-speech models achieve high fidelity, the distinction between synthetic voices in structured environments becomes a vital area of study. This paper introduces Advosynth-500, a specialized dataset comprising 100 synthetic speech files featuring 10 unique advocate identities. Using the Speech Llama Omni model, we simulate five distinct advocate pairs engaged in courtroom arguments. We define specific vocal characteristics for each advocate and present a speaker identification challenge to evaluate the ability of modern systems to map audio files to their respective synthetic origins. Dataset is available at this link-https: //github.com/naturenurtureelite/ADVOSYNTH-500.