Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025

📅 2025-06-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal large language models (MLLMs) exhibit significant vulnerability to jailbreaking attacks, yet lack standardized, systematic safety evaluation benchmarks. Method: This project launches the ATLAS 2025 International Challenge—the first standardized safety benchmark for MLLMs—introducing a two-stage adversarial vision-language attack paradigm. It integrates adversarial image generation, prompt injection perturbation, cross-modal semantic alignment analysis, and a red-teaming framework to enable coordinated white-box and black-box stress testing. The methodology ensures reproducible and scalable safety assessment. Contribution/Results: The challenge attracted 86 international teams; empirical results exposed pronounced fragility of mainstream MLLMs under joint vision-language attacks. All code, datasets, and evaluation protocols are fully open-sourced, establishing a foundational infrastructure and setting a new methodological benchmark for multimodal AI safety research.

Technology Category

Application Category

📝 Abstract
Multimodal Large Language Models (MLLMs) have enabled transformative advancements across diverse applications but remain susceptible to safety threats, especially jailbreak attacks that induce harmful outputs. To systematically evaluate and improve their safety, we organized the Adversarial Testing&Large-model Alignment Safety Grand Challenge (ATLAS) 2025}. This technical report presents findings from the competition, which involved 86 teams testing MLLM vulnerabilities via adversarial image-text attacks in two phases: white-box and black-box evaluations. The competition results highlight ongoing challenges in securing MLLMs and provide valuable guidance for developing stronger defense mechanisms. The challenge establishes new benchmarks for MLLM safety evaluation and lays groundwork for advancing safer multimodal AI systems. The code and data for this challenge are openly available at https://github.com/NY1024/ATLAS_Challenge_2025.
Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLM vulnerabilities to jailbreak attacks
Improving safety against adversarial image-text attacks
Establishing benchmarks for multimodal AI safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

Organized ATLAS Challenge for MLLM safety
Tested vulnerabilities via adversarial image-text attacks
Established new benchmarks for MLLM safety
🔎 Similar Papers
No similar papers found.
Zonghao Ying
Zonghao Ying
SKLCCSE, BUAA
Trustworthy AI
S
Siyang Wu
Zhongguancun Laboratory, Beijing, China
R
Run Hao
Aarhus University, Aarhus, Denmark
P
Peng Ying
China University of Mining and Technology, Xvzhou, China
S
Shixuan Sun
Sun Yat-sen University, Guangzhou, China
P
Pengyu Chen
University of Electronic Science and Technology of China, Chengdu, China
J
Junze Chen
University of Electronic Science and Technology of China, Chengdu, China
Hao Du
Hao Du
ByteDance
Computer VisionMachine Learning
K
Kaiwen Shen
University of Electronic Science and Technology of China, Chengdu, China
S
Shangkun Wu
University of Electronic Science and Technology of China, Chengdu, China
Jiwei Wei
Jiwei Wei
Professor at University of Electronic Science and Technology of China (UESTC)
Cross-Modal RetrievalMetric LearningAdversarial Machine LearningAIGC
S
Shiyuan He
University of Electronic Science and Technology of China, Chengdu, China
Y
Yang Yang
University of Electronic Science and Technology of China, Chengdu, China
X
Xiaohai Xu
School of Electronic, Electrical and Communication Engineering, UCAS, Beijing, China
K
Ke Ma
School of Electronic, Electrical and Communication Engineering, UCAS, Beijing, China
Q
Qianqian Xu
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, CAS, Beijing, China
Qingming Huang
Qingming Huang
University of the Chinese Academy of Sciences
Multimedia Analysis and RetrievalImage and Video ProcessingPattern RecognitionComputer VisionVideo Coding
Shi Lin
Shi Lin
Zhejiang Gongshang University
llm security
X
Xun Wang
Zhejiang Gongshang University, Hangzhou, China
Changting Lin
Changting Lin
Zhejiang University
Computer Science
Meng Han
Meng Han
Intelligence Fusion Research Center (IFRC)
Reliable AIData MiningMachine LearningBig DataSecurity&Privacy
Y
Yilei Jiang
The Chinese University of Hong Kong, Hong Kong, China
Siqi Lai
Siqi Lai
Ph.D. student, The Hong Kong University of Science and Technology (Guangzhou)
Data MiningLLM AgentUrban Intelligence
Y
Yaozhi Zheng
The Chinese University of Hong Kong, Hong Kong, China
Y
Yifei Song
The Chinese University of Hong Kong, Hong Kong, China
Xiangyu Yue
Xiangyu Yue
The Chinese University of Hong Kong / UC Berkeley / Stanford University / NJU
Artificial IntelligenceComputer VisionMulti-modal Learning
Zonglei Jing
Zonglei Jing
Beihang University
Machine LearningReinforcement LearningOptimal Control
Tianyuan Zhang
Tianyuan Zhang
MIT
Computer VisionMachine Learning
Z
Zhilei Zhu
Hefei Comprehensive National Science Center, Hefei, China
A
Aishan Liu
Beihang University, Beijing, China
Jiakai Wang
Jiakai Wang
Zhongguancun Laboratory
Adversarial examplesTrustworthy AI
Siyuan Liang
Siyuan Liang
College of Computing and Data Science, Nanyang Technological University
Trustworthy Foundation Model
X
Xianglong Kong
Hefei Comprehensive National Science Center, Hefei, China
H
Hainan Li
Hefei Comprehensive National Science Center, Hefei, China
J
Junjie Mu
Politecnico di Milano, Milan, Italy
Haotong Qin
Haotong Qin
ETH Zürich
TinyMLModel CompressionComputer VisionDeep Learning
Y
Yue Yu
Pengcheng Laboratory, Shenzhen, China
L
Lei Chen
Tsinghua University, Beijing, China
Felix Juefei-Xu
Felix Juefei-Xu
Research Scientist, Meta Superintelligence Labs
Generative ModelsDeep LearningComputer VisionAI SafetyAdversarial Robustness
Q
Qing Guo
A*STAR, Singapore
X
Xinyun Chen
Google Brain, Mountain View, United States
Y
Yew Soon Ong
Nanyang Technological University, Singapore
X
Xianglong Liu
Beihang University, Beijing, China; Zhongguancun Laboratory, Beijing, China; Hefei Comprehensive National Science Center, Hefei, China
Dawn Song
Dawn Song
Professor of Computer Science, UC Berkeley
Computer Security and Privacy
Alan Yuille
Alan Yuille
Professor of Cognitive Science and Computer Science, Johns Hopkins University
Computer VisionComputational Models of Mind and BrainMachine Learning
Philip Torr
Philip Torr
Professor, University of Oxford
Department of Engineering
Dacheng Tao
Dacheng Tao
Nanyang Technological University
artificial intelligencemachine learningcomputer visionimage processingdata mining