Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multimodal alignment methods rely on static adversarial setups, which struggle to address broad attack surfaces and robustness challenges. This work proposes CEMMA, a novel framework that introduces co-evolutionary mechanisms into multimodal safety alignment for the first time. CEMMA employs genetic algorithms—including mutation, crossover, and differential evolution—to construct evolving attackers that generate structured adversarial examples. Concurrently, an adaptive defender is trained on synthesized hard examples and integrates inference-time defenses such as AdaShield, establishing a closed-loop, iterative arms race between attack and defense. This dynamic adversarial interplay significantly enhances both jailbreaking success rates and alignment robustness across multiple benchmarks, while maintaining high data efficiency and without increasing the refusal rate on benign inputs.

Technology Category

Application Category

📝 Abstract
Adversarial behavior plays a central role in aligning large language models with human values. However, existing alignment methods largely rely on static adversarial settings, which fundamentally limit robustness, particularly in multimodal settings with a larger attack surface. In this work, we move beyond static adversarial supervision and introduce co-evolutionary alignment with evolving attacks, instantiated by CEMMA (Co-Evolutionary Multi-Modal Alignment), an automated and adaptive framework for multimodal safety alignment. We introduce an Evolutionary Attacker that decomposes adversarial prompts into method templates and harmful intents. By employing genetic operators, including mutation, crossover, and differential evolution, it enables simple seed attacks to inherit the structural efficacy of sophisticated jailbreaks. The Adaptive Defender is iteratively updated on the synthesized hard negatives, forming a closed-loop process that adapts alignment to evolving attacks. Experiments show that the Evolutionary Attacker substantially increases red-teaming jailbreak attack success rate (ASR), while the Adaptive Defender improves robustness and generalization across benchmarks with higher data efficiency, without inducing excessive benign refusal, and remains compatible with inference-time defenses such as AdaShield.
Problem

Research questions and friction points this paper is trying to address.

adversarial alignment
multimodal safety
robustness
jailbreak attacks
co-evolutionary
Innovation

Methods, ideas, or system contributions that make the work stand out.

co-evolutionary alignment
evolutionary attacker
adaptive defender
multimodal safety
structured adversarial evolution
🔎 Similar Papers
No similar papers found.
G
Guoxin Shi
H
Haoyu Wang
Nanyang Technological University
Z
Zaihui Yang
Tsinghua University
Y
Yuxing Wang
Tsinghua University
Yongzhe Chang
Yongzhe Chang
UNSW/Data 61 PhD, Tsinghua postdoc.
machine learningreinforcement learning