Impact of Phonetics on Speaker Identity in Adversarial Voice Attack

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Phonetic-level perturbations in speech adversarial attacks—such as vowel centralization and consonant substitution—induce significant identity drift, jointly degrading both automatic speech recognition (ASR) and speaker verification (SV) systems. Method: We systematically analyze, from a phoneme-centric perspective, the root causes of speaker identity distortion in adversarial examples, proposing a novel phoneme-aware defense paradigm. Targeting DeepSpeech, we generate targeted adversarial samples and quantitatively evaluate their impact on transcription accuracy and speaker embedding distributions using both genuine and impostor speech. Experiments span 16 phonetically diverse target phrases. Contribution/Results: All phrases exhibit high transcription error rates and substantial speaker embedding shifts, confirming that phoneme-level perturbations constitute a synergistic threat to ASR and SV. This work establishes a theoretically grounded, interpretable framework for enhancing speech robustness against multi-task adversarial attacks.

Technology Category

Application Category

📝 Abstract
Adversarial perturbations in speech pose a serious threat to automatic speech recognition (ASR) and speaker verification by introducing subtle waveform modifications that remain imperceptible to humans but can significantly alter system outputs. While targeted attacks on end-to-end ASR models have been widely studied, the phonetic basis of these perturbations and their effect on speaker identity remain underexplored. In this work, we analyze adversarial audio at the phonetic level and show that perturbations exploit systematic confusions such as vowel centralization and consonant substitutions. These distortions not only mislead transcription but also degrade phonetic cues critical for speaker verification, leading to identity drift. Using DeepSpeech as our ASR target, we generate targeted adversarial examples and evaluate their impact on speaker embeddings across genuine and impostor samples. Results across 16 phonetically diverse target phrases demonstrate that adversarial audio induces both transcription errors and identity drift, highlighting the need for phonetic-aware defenses to ensure the robustness of ASR and speaker recognition systems.
Problem

Research questions and friction points this paper is trying to address.

Analyzing phonetic basis of adversarial audio perturbations
Investigating impact of perturbations on speaker identity verification
Exploring phonetic distortions causing transcription and identity errors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Phonetic-level adversarial audio analysis
Targeted perturbations exploiting phonetic confusions
Identity drift in speaker verification systems
🔎 Similar Papers
D
Daniyal Kabir Dar
Department of Computer Science and Engineering, Michigan State University, USA
Qiben Yan
Qiben Yan
Computer Science and Engineering, Michigan State University
Security and PrivacyCyber-Physical SystemsAI AgentInternet-of-ThingsSmart Contract
L
Li Xiao
Department of Computer Science and Engineering, Michigan State University, USA
Arun Ross
Arun Ross
Professor | Michigan State University
BiometricsComputer VisionPattern RecognitionIris Recognition