🤖 AI Summary
This study addresses the critical threat posed by deepfake audio to speaker verification systems, particularly in high-stakes security scenarios where such systems can be readily bypassed. It presents the first systematic evaluation of leading commercial speaker verification platforms under realistic deepfake attacks, leveraging state-of-the-art voice cloning models and cross-domain synthetic datasets. The findings reveal two fundamental vulnerabilities: attackers can achieve highly effective spoofing with only a minimal number of genuine voice samples, and current anti-spoofing detectors exhibit markedly poor generalization when confronted with deepfakes generated by unseen synthesis methods. These results underscore the fragility of existing authentication architectures and provide empirical grounding and strategic direction for developing more robust defense mechanisms.
📝 Abstract
As audio deepfakes transition from research artifacts to widely available commercial tools, robust biometric authentication faces pressing security threats in high-stakes industries. This paper presents a systematic empirical evaluation of state-of-the-art speaker authentication systems based on a large-scale speech synthesis dataset, revealing two major security vulnerabilities: 1) modern voice cloning models trained on very small samples can easily bypass commercial speaker verification systems; and 2) anti-spoofing detectors struggle to generalize across different methods of audio synthesis, leading to a significant gap between in-domain performance and real-world robustness. These findings call for a reconsideration of security measures and stress the need for architectural innovations, adaptive defenses, and the transition towards multi-factor authentication.