Boosting the Transferability of Audio Adversarial Examples with Acoustic Representation Optimization

📅 2025-03-25

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

To address the poor cross-model transferability of adversarial examples in automatic speech recognition (ASR) systems, this paper proposes an acoustic representation optimization method: for the first time, adversarial perturbations are constrained within a model-agnostic, low-level robust acoustic feature space, thereby unifying perturbation alignment and transferability. The method is plug-and-play, compatible with mainstream audio adversarial frameworks, and requires no modification to target models. Black-box attack experiments across three state-of-the-art ASR models demonstrate an average 32.7% improvement in transfer success rate, while strictly preserving perceptual fidelity of the original speech. Key contributions include: (1) establishing an acoustic-representation-driven paradigm for enhancing adversarial transferability; (2) achieving synergistic optimization of high transferability and high fidelity; and (3) providing a general, lightweight, and model-agnostic adversarial enhancement solution that requires no access to target model internals.

Technology Category

Application Category

📝 Abstract

With the widespread application of automatic speech recognition (ASR) systems, their vulnerability to adversarial attacks has been extensively studied. However, most existing adversarial examples are generated on specific individual models, resulting in a lack of transferability. In real-world scenarios, attackers often cannot access detailed information about the target model, making query-based attacks unfeasible. To address this challenge, we propose a technique called Acoustic Representation Optimization that aligns adversarial perturbations with low-level acoustic characteristics derived from speech representation models. Rather than relying on model-specific, higher-layer abstractions, our approach leverages fundamental acoustic representations that remain consistent across diverse ASR architectures. By enforcing an acoustic representation loss to guide perturbations toward these robust, lower-level representations, we enhance the cross-model transferability of adversarial examples without degrading audio quality. Our method is plug-and-play and can be integrated with any existing attack methods. We evaluate our approach on three modern ASR models, and the experimental results demonstrate that our method significantly improves the transferability of adversarial examples generated by previous methods while preserving the audio quality.

Problem

Research questions and friction points this paper is trying to address.

Enhancing transferability of audio adversarial examples across ASR models

Addressing lack of model-specific information in real-world attack scenarios

Optimizing perturbations using low-level acoustic representations for consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes acoustic representation for adversarial examples

Enhances transferability across diverse ASR models

Plug-and-play integration with existing attack methods

🔎 Similar Papers

No similar papers found.