🤖 AI Summary
This work uncovers a novel security vulnerability in Audio-based Large Language Models (ALLMs), such as Qwen2-Audio: adversaries can remotely trigger target behaviors—including wake-word activation or execution of harmful instructions—by injecting imperceptible adversarial audio perturbations, or degrade response quality via physically playable background noise during user interaction. We provide the first empirical evidence that such attacks exhibit cross-model transferability, airborne propagation, and scalable impact—potentially affecting nearby bystanders. Methodologically, we integrate gradient-based adversarial audio generation with real-world acoustic playback experiments under realistic environmental conditions. Our key contributions are: (1) a systematic characterization of ALLMs’ fragility to physical-domain audio perturbations; (2) the first end-to-end, real-world attack framework tailored for ALLMs; and (3) comprehensive validation of the attack’s efficacy, stealthiness, and generalizability across multiple models and scenarios—establishing a critical benchmark for future defense research.
📝 Abstract
This paper investigates the real-world vulnerabilities of audio-based large language models (ALLMs), such as Qwen2-Audio. We first demonstrate that an adversary can craft stealthy audio perturbations to manipulate ALLMs into exhibiting specific targeted behaviors, such as eliciting responses to wake-keywords (e.g., "Hey Qwen"), or triggering harmful behaviors (e.g. "Change my calendar event"). Subsequently, we show that playing adversarial background noise during user interaction with the ALLMs can significantly degrade the response quality. Crucially, our research illustrates the scalability of these attacks to real-world scenarios, impacting other innocent users when these adversarial noises are played through the air. Further, we discuss the transferrability of the attack, and potential defensive measures.