Decoding Deception: Understanding Automatic Speech Recognition Vulnerabilities in Evasion and Poisoning Attacks

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

This work exposes the severe vulnerability of automatic speech recognition (ASR) systems to adversarial attacks. To address both white-box and non-transferable black-box settings, we propose an efficient adversarial example generation method that integrates the Fast Gradient Sign Method (FGSM) with zeroth-order optimization, enabling low-perturbation, highly imperceptible adversarial speech at signal-to-noise ratios up to 35 dB and generation times under 60 seconds. Additionally, we design a novel data poisoning strategy that significantly degrades recognition accuracy—and induces semantic misclassifications—in mainstream open-source ASR models, including Whisper and Wav2Vec 2.0. Experimental results demonstrate high attack success rates under minimal perturbations, providing the first systematic empirical validation of real-world security risks for deployed ASR systems. Our findings establish critical empirical foundations and concrete technical pathways for advancing robustness research in speech AI.

Technology Category

Application Category

📝 Abstract

Recent studies have demonstrated the vulnerability of Automatic Speech Recognition systems to adversarial examples, which can deceive these systems into misinterpreting input speech commands. While previous research has primarily focused on white-box attacks with constrained optimizations, and transferability based black-box attacks against commercial Automatic Speech Recognition devices, this paper explores cost efficient white-box attack and non transferability black-box adversarial attacks on Automatic Speech Recognition systems, drawing insights from approaches such as Fast Gradient Sign Method and Zeroth-Order Optimization. Further, the novelty of the paper includes how poisoning attack can degrade the performances of state-of-the-art models leading to misinterpretation of audio signals. Through experimentation and analysis, we illustrate how hybrid models can generate subtle yet impactful adversarial examples with very little perturbation having Signal Noise Ratio of 35dB that can be generated within a minute. These vulnerabilities of state-of-the-art open source model have practical security implications, and emphasize the need for adversarial security.

Problem

Research questions and friction points this paper is trying to address.

Investigating cost-efficient white-box attacks on speech recognition systems

Exploring non-transferable black-box adversarial attacks on ASR models

Analyzing poisoning attacks that degrade state-of-the-art model performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cost-efficient white-box attack using Fast Gradient Sign Method

Non-transferability black-box attack via Zeroth-Order Optimization

Poisoning attacks degrading state-of-the-art model performance

🔎 Similar Papers

No similar papers found.