Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work systematically exposes the vulnerability of speech large language models (Speech-LLMs)—including Qwen2-Audio and Granite-Speech—to universal acoustic adversarial attacks. We propose a **conditional selective universal acoustic attack**, operating in a white-box setting: it optimizes a fixed-prefix perturbation that induces targeted malicious behaviors—such as response suppression, task hijacking, and attribute-conditioned activation (e.g., triggered dynamically by speaker gender or language)—without requiring knowledge of the target transcript or task. Our method integrates joint gradient backpropagation through the speech encoder and LLM, attribute-aware masking constraints, and gradient-driven universal perturbation synthesis. Experiments demonstrate high attack success rates across both model families. To our knowledge, this is the first approach enabling fine-grained, attribute-controllable behavioral manipulation of Speech-LLMs, departing from conventional static or full-sequence attack paradigms. It establishes a new benchmark for security evaluation and robustness enhancement of speech-centric foundation models.

Technology Category

Application Category

📝 Abstract

The combination of pre-trained speech encoders with large language models has enabled the development of speech LLMs that can handle a wide range of spoken language processing tasks. While these models are powerful and flexible, this very flexibility may make them more vulnerable to adversarial attacks. To examine the extent of this problem, in this work we investigate universal acoustic adversarial attacks on speech LLMs. Here a fixed, universal, adversarial audio segment is prepended to the original input audio. We initially investigate attacks that cause the model to either produce no output or to perform a modified task overriding the original prompt. We then extend the nature of the attack to be selective so that it activates only when specific input attributes, such as a speaker gender or spoken language, are present. Inputs without the targeted attribute should be unaffected, allowing fine-grained control over the model outputs. Our findings reveal critical vulnerabilities in Qwen2-Audio and Granite-Speech and suggest that similar speech LLMs may be susceptible to universal adversarial attacks. This highlights the need for more robust training strategies and improved resistance to adversarial attacks.

Problem

Research questions and friction points this paper is trying to address.

Investigates universal acoustic attacks on speech LLMs

Explores selective activation by input attributes

Reveals vulnerabilities in Qwen2-Audio and Granite-Speech

Innovation

Methods, ideas, or system contributions that make the work stand out.

Universal adversarial audio prepended to input

Selective activation based on input attributes

Targeted control over model outputs

🔎 Similar Papers

LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation

2024-09-23arXiv.orgCitations: 0

Apple

Cupertino, United States of America

AI Research Scientist - Meta Superintelligence Labs (PhD)