Selective Masking Adversarial Attack on Automatic Speech Recognition Systems

📅 2025-04-06

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

To address the robustness deficiency of automatic speech recognition (ASR) systems in two-speaker scenarios, this paper proposes Selective Masking Adversarial Attack (SMA)—the first targeted masking attack method designed specifically for dual-source speech. SMA achieves speaker-level selective interference: it precisely preserves the target speaker’s speech while fully suppressing the interfering speaker’s voice in overlapped utterances. We design an optimization algorithm based on Gaussian initialization and iterative gradient updates, jointly incorporating Conformer-CTC model adaptation and signal-to-noise ratio (SNR) constraints to balance attack success rate and audio fidelity. Evaluated on a Conformer-CTC ASR model, SMA achieves a 100% targeted attack success rate with an average SNR of 37.15 dB—substantially outperforming existing baseline methods. This work establishes a novel paradigm for security evaluation of multi-speaker ASR systems.

Technology Category

Application Category

📝 Abstract

Extensive research has shown that Automatic Speech Recognition (ASR) systems are vulnerable to audio adversarial attacks. Current attacks mainly focus on single-source scenarios, ignoring dual-source scenarios where two people are speaking simultaneously. To bridge the gap, we propose a Selective Masking Adversarial attack, namely SMA attack, which ensures that one audio source is selected for recognition while the other audio source is muted in dual-source scenarios. To better adapt to the dual-source scenario, our SMA attack constructs the normal dual-source audio from the muted audio and selected audio. SMA attack initializes the adversarial perturbation with a small Gaussian noise and iteratively optimizes it using a selective masking optimization algorithm. Extensive experiments demonstrate that the SMA attack can generate effective and imperceptible audio adversarial examples in the dual-source scenario, achieving an average success rate of attack of 100% and signal-to-noise ratio of 37.15dB on Conformer-CTC, outperforming the baselines.

Problem

Research questions and friction points this paper is trying to address.

Attacks on ASR systems in dual-source audio scenarios

Ensures one audio source is recognized while muting the other

Generates imperceptible adversarial examples with high success rate

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective masking optimizes dual-source audio attacks

Gaussian noise initializes adversarial perturbation

Achieves 100% attack success rate

🔎 Similar Papers

Comparative study on noise-augmented training and its effect on adversarial robustness in ASR systems

2024-09-03Computer Speech and LanguageCitations: 0

💼 Related Jobs

Speech and Audio Systems Engineer

Qualcomm

$122,500.00 - $183,700.00

San Diego, NA

Authors to Follow