Codec-Robust Attacks on Audio LLMs

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

Existing adversarial attacks on large audio language models often fail after real-world codec compression, limiting their practical threat. This work proposes CodecAttack, the first method to optimize perturbations in the continuous latent space of neural audio codecs and introduces a multi-bitrate straight-through Expectation over Transformation (EoT) strategy to enhance robustness—all without modifying the target model, enabling effective black-box attacks. Evaluated under Opus at medium bitrates, CodecAttack achieves an average attack success rate of 85.5%, substantially outperforming waveform-based baselines (≤26%). Moreover, it demonstrates strong transferability to unseen codecs, attaining 100% success on MP3 and 84% on AAC-LC, thereby establishing a new codec-aware paradigm for adversarial audio attacks.

📝 Abstract

Prior attacks on Audio Large Language Models (Audio LLMs) demonstrated that carefully crafted waveform-domain perturbations can force targeted adversarial outputs. As a defense mechanism against these attacks, real-world codec compression preprocessing has been studied to both detect and remove the perturbations. Yet no existing attack has demonstrated robustness against these compressions. We introduce CodecAttack, which optimizes a perturbation in a neural audio codec's continuous latent space rather than directly perturbing the audio waveform. We show that the codec's compression channel, which discards waveform perturbations, transmits perturbations crafted in its own latent space. To further harden the attack across real-world compression channels, we apply multi-bitrate straight-through Expectation-over-Transformation (EoT), all without modifying the target model. Across three realistic Audio LLM deployment scenarios and three target models, CodecAttack achieves an average 85.5% target-substring attack success rate (ASR) on Opus at moderate bitrates, while the waveform baseline trained with identical EoT hardening does not exceed 26% at any bitrate. The attack transfers to held-out codecs, reaching up to 100% ASR on MP3 and 84% on AAC-LC without retraining. A per-band energy analysis shows that the latent perturbation concentrates below 4kHz, exactly where codecs allocate the most bits, while the waveform baseline spreads into higher frequencies that codecs discard. These results demonstrate that lossy compression is not a reliable defense against adversarial audio and that codec-aware attacks pose a practical threat to deployed Audio LLM systems.

Problem

Research questions and friction points this paper is trying to address.

adversarial attack

audio LLM

codec robustness

lossy compression

latent space perturbation

Innovation

Methods, ideas, or system contributions that make the work stand out.

codec-aware attack

latent-space perturbation

audio adversarial attack