🤖 AI Summary
To address the challenge of identifying unknown generative models in open-set scenarios for voice deepfake provenance attribution, this paper introduces “voice source verification” as a novel task: determining whether a query audio and a reference audio originate from the same generative model. Methodologically, inspired by speaker verification paradigms, we extract robust embeddings using a source attribution classifier and perform binary verification via distance metrics—e.g., cosine similarity—enabling cross-speaker, cross-lingual, and post-processing–resilient evaluation. We conduct systematic benchmarking across multiple generators, languages, and post-processing perturbations, exposing generalization bottlenecks and security vulnerabilities of existing approaches. As the first benchmark tailored for forensic applications, our framework is reproducible, extensible, and publicly released—establishing a foundational resource for rigorous evaluation of voice source verification.
📝 Abstract
With the proliferation of speech deepfake generators, it becomes crucial not only to assess the authenticity of synthetic audio but also to trace its origin. While source attribution models attempt to address this challenge, they often struggle in open-set conditions against unseen generators. In this paper, we introduce the source verification task, which, inspired by speaker verification, determines whether a test track was produced using the same model as a set of reference signals. Our approach leverages embeddings from a classifier trained for source attribution, computing distance scores between tracks to assess whether they originate from the same source. We evaluate multiple models across diverse scenarios, analyzing the impact of speaker diversity, language mismatch, and post-processing operations. This work provides the first exploration of source verification, highlighting its potential and vulnerabilities, and offers insights for real-world forensic applications.