Tune In, Act Up: Exploring the Impact of Audio Modality-Specific Edits on Large Audio Language Models in Jailbreak

📅 2025-01-23

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This study investigates how audio-modality-specific perturbations—such as pitch shifting, accent enhancement, and noise injection—affect jailbreaking efficacy against Large Audio-Language Models (LALMs), uncovering novel security vulnerabilities in the auditory channel. To address the lack of standardized benchmarks in audio adversarial security, we introduce EADs, the first standardized audio jailbreaking benchmark, and AET, an open-source, reproducible audio editing toolbox. Integrating signal processing, adversarial example generation, and multimodal robustness evaluation, we conduct systematic black-box and white-box testing. Experimental results demonstrate that audio editing significantly increases jailbreaking success rates—by up to 3.8×—and that prosodic perturbations (e.g., pitch and rhythm manipulation) are more stealthy and disruptive than conventional text-based prompt engineering. Our work establishes a new paradigm for LALM security assessment and provides empirical foundations for developing effective defenses.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) demonstrate remarkable zero-shot performance across various natural language processing tasks. The integration of multimodal encoders extends their capabilities, enabling the development of Multimodal Large Language Models that process vision, audio, and text. However, these capabilities also raise significant security concerns, as these models can be manipulated to generate harmful or inappropriate content through jailbreak. While extensive research explores the impact of modality-specific input edits on text-based LLMs and Large Vision-Language Models in jailbreak, the effects of audio-specific edits on Large Audio-Language Models (LALMs) remain underexplored. Hence, this paper addresses this gap by investigating how audio-specific edits influence LALMs inference regarding jailbreak. We introduce the Audio Editing Toolbox (AET), which enables audio-modality edits such as tone adjustment, word emphasis, and noise injection, and the Edited Audio Datasets (EADs), a comprehensive audio jailbreak benchmark. We also conduct extensive evaluations of state-of-the-art LALMs to assess their robustness under different audio edits. This work lays the groundwork for future explorations on audio-modality interactions in LALMs security.

Problem

Research questions and friction points this paper is trying to address.

Audio Modification

Large-scale Model Performance

Security Implications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Audio Feature Modification

Robustness Evaluation

Pretrained Model Adaptation

🔎 Similar Papers

Language-Queried Target Sound Extraction Without Parallel Training Data

2024-09-14arXiv.orgCitations: 0