Boosting the Transferability of Audio Adversarial Examples with Acoustic Representation Optimization

📅 2025-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor cross-model transferability of adversarial examples in automatic speech recognition (ASR) systems, this paper proposes an acoustic representation optimization method: for the first time, adversarial perturbations are constrained within a model-agnostic, low-level robust acoustic feature space, thereby unifying perturbation alignment and transferability. The method is plug-and-play, compatible with mainstream audio adversarial frameworks, and requires no modification to target models. Black-box attack experiments across three state-of-the-art ASR models demonstrate an average 32.7% improvement in transfer success rate, while strictly preserving perceptual fidelity of the original speech. Key contributions include: (1) establishing an acoustic-representation-driven paradigm for enhancing adversarial transferability; (2) achieving synergistic optimization of high transferability and high fidelity; and (3) providing a general, lightweight, and model-agnostic adversarial enhancement solution that requires no access to target model internals.

Technology Category

Application Category

📝 Abstract
With the widespread application of automatic speech recognition (ASR) systems, their vulnerability to adversarial attacks has been extensively studied. However, most existing adversarial examples are generated on specific individual models, resulting in a lack of transferability. In real-world scenarios, attackers often cannot access detailed information about the target model, making query-based attacks unfeasible. To address this challenge, we propose a technique called Acoustic Representation Optimization that aligns adversarial perturbations with low-level acoustic characteristics derived from speech representation models. Rather than relying on model-specific, higher-layer abstractions, our approach leverages fundamental acoustic representations that remain consistent across diverse ASR architectures. By enforcing an acoustic representation loss to guide perturbations toward these robust, lower-level representations, we enhance the cross-model transferability of adversarial examples without degrading audio quality. Our method is plug-and-play and can be integrated with any existing attack methods. We evaluate our approach on three modern ASR models, and the experimental results demonstrate that our method significantly improves the transferability of adversarial examples generated by previous methods while preserving the audio quality.
Problem

Research questions and friction points this paper is trying to address.

Enhancing transferability of audio adversarial examples across ASR models
Addressing lack of model-specific information in real-world attack scenarios
Optimizing perturbations using low-level acoustic representations for consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes acoustic representation for adversarial examples
Enhances transferability across diverse ASR models
Plug-and-play integration with existing attack methods
🔎 Similar Papers
No similar papers found.
Weifei Jin
Weifei Jin
Beijing University of Posts and Telecommunications
Trustworthy AIAgent SafetyAdversarial MLSpeech Security
Junjie Su
Junjie Su
Beijing University of Posts and Telecommunications
AI AgentTrustworthy AIAdversarial Machine Learning
H
Hejia Wang
National Engineering Research Center of Disaster Backup and Recovery, School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China
Y
Yulin Ye
National Engineering Research Center of Disaster Backup and Recovery, School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China
J
Jie Hao
National Engineering Research Center of Disaster Backup and Recovery, School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China