The Alignment Curse: Cross-Modality Jailbreak Transfer in Omni-Models

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work uncovers the "alignment curse" in multimodal large language models, wherein strong cross-modal alignment inadvertently enables the transfer of textual jailbreak vulnerabilities to the audio modality, thereby introducing cross-modal security risks. To address this, the authors propose a semantics-aware method that effectively adapts well-established textual jailbreak attacks to audio inputs by leveraging semantic similarity. Through red-teaming evaluations and cross-model transfer experiments under a strict threat model—where only audio inputs are permitted—the approach achieves jailbreak success rates comparable to or even exceeding those of specialized audio-based attacks. The method demonstrates strong generalization across models and, for the first time, systematically validates the transferability and real-world threat of text-to-audio jailbreaks.

Technology Category

Application Category

📝 Abstract
Recent advances in end-to-end trained omni-models have significantly improved multimodal understanding. At the same time, safety red-teaming has expanded beyond text to encompass audio-based jailbreak attacks. However, an important bridge between textual and audio jailbreaks remains underexplored. In this work, we study the cross-modality transfer of jailbreak attacks from text to audio, motivated by the semantic similarity between the two modalities and the maturity of textual jailbreak methods. We first analyze the connection between modality alignment and cross-modality jailbreak transfer, showing that strong alignment can inadvertently propagate textual vulnerabilities to the audio modality, which we term the alignment curse. Guided by this analysis, we conduct an empirical evaluation of textual jailbreaks, text-transferred audio jailbreaks, and existing audio-based jailbreaks on recent omni-models. Our results show that text-transferred audio jailbreaks perform comparably to, and often better than, audio-based jailbreaks, establishing them as simple yet powerful baselines for future audio red-teaming. We further demonstrate strong cross-model transferability and show that text-transferred audio attacks remain effective even under a stricter audio-only access threat model.
Problem

Research questions and friction points this paper is trying to address.

cross-modality jailbreak
audio jailbreak
modality alignment
omni-models
red-teaming
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-modality transfer
alignment curse
audio jailbreak
multimodal safety
red-teaming
🔎 Similar Papers
No similar papers found.
Y
Yupeng Chen
Torr Vision Group, University of Oxford
Junchi Yu
Junchi Yu
University of Oxford
information theoryfoundation modelsgraph learning
A
Aoxi Liu
School of Data Science, The Chinese University of Hong Kong, Shenzhen
P
Philip H. S. Torr
Torr Vision Group, University of Oxford
Adel Bibi
Adel Bibi
University of Oxford
AI SafetyAI SecurityMachine Learning