Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment through Latent Acoustic Pattern Triggers

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study is the first to systematically expose the severe vulnerability of Audio Large Language Models (ALLMs) to acoustic backdoor attacks. Addressing the lack of stealthy and robust acoustic triggers in prior work, we propose HIN—a novel framework that embeds low-perceptibility triggers into raw waveforms via temporal dynamic modulation and spectrum-customized noise, while leveraging feature encoding analysis and adversarial response detection to achieve high attack success rates. Experiments demonstrate >90% attack success under realistic perturbations including environmental noise and speech rate variation, with near-zero sensitivity to volume changes; poisoned samples induce only marginal fluctuations in training loss, confirming high stealthiness. Furthermore, we introduce AudioSafe—the first standardized benchmark for ALLM robustness evaluation—comprising nine categories of audio-specific security risks.

Technology Category

Application Category

📝 Abstract
As Audio Large Language Models (ALLMs) emerge as powerful tools for speech processing, their safety implications demand urgent attention. While considerable research has explored textual and vision safety, audio's distinct characteristics present significant challenges. This paper first investigates: Is ALLM vulnerable to backdoor attacks exploiting acoustic triggers? In response to this issue, we introduce Hidden in the Noise (HIN), a novel backdoor attack framework designed to exploit subtle, audio-specific features. HIN applies acoustic modifications to raw audio waveforms, such as alterations to temporal dynamics and strategic injection of spectrally tailored noise. These changes introduce consistent patterns that an ALLM's acoustic feature encoder captures, embedding robust triggers within the audio stream. To evaluate ALLM robustness against audio-feature-based triggers, we develop the AudioSafe benchmark, assessing nine distinct risk types. Extensive experiments on AudioSafe and three established safety datasets reveal critical vulnerabilities in existing ALLMs: (I) audio features like environment noise and speech rate variations achieve over 90% average attack success rate. (II) ALLMs exhibit significant sensitivity differences across acoustic features, particularly showing minimal response to volume as a trigger, and (III) poisoned sample inclusion causes only marginal loss curve fluctuations, highlighting the attack's stealth.
Problem

Research questions and friction points this paper is trying to address.

Investigates ALLM vulnerability to acoustic backdoor attacks
Introduces HIN framework exploiting audio-specific latent triggers
Assesses ALLM robustness via AudioSafe benchmark testing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Exploits latent acoustic pattern triggers
Modifies audio waveforms with tailored noise
Assesses vulnerabilities via AudioSafe benchmark
Liang Lin
Liang Lin
Fellow of IEEE/IAPR, Professor of Computer Science, Sun Yat-sen University
Embodied AICausal Inference and LearningMultimodal Data Analysis
M
Miao Yu
University of Science and Technology of China
K
Kaiwen Luo
North China Electric Power University
Y
Yibo Zhang
Beijing University of Posts and Telecommunications
Lilan Peng
Lilan Peng
Southwest Jiaotong University
D
Dexian Wang
Chengdu University of Traditional Chinese Medicine
X
Xuehai Tang
Institute of Information Engineering, Chinese Academy of Sciences
Yuanhe Zhang
Yuanhe Zhang
PhD in Statistics, Department of Statistics, University of Warwick
Learning TheoryReasoningStatistics
X
Xikang Yang
Institute of Information Engineering, Chinese Academy of Sciences
Zhenhong Zhou
Zhenhong Zhou
Nanyang Technological University
Large Language ModelAI SafetyLLM Safety
K
Kun Wang
Nanyang Technological University
Y
Yang Liu
Nanyang Technological University