Safety at Scale: A Comprehensive Survey of Large Model Safety

📅 2025-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large multimodal models—including LLMs, VLMs, VFMs, diffusion models, and AI agents—face diverse security threats such as adversarial attacks, data poisoning, prompt injection, jailbreaking, backdoors, model/data extraction, and agent-specific vulnerabilities. Method: We conduct a systematic survey grounded in literature analysis, threat modeling, and cross-domain comparison, integrating perspectives from robustness, prompt engineering, privacy auditing, and benchmarking. Contribution/Results: We propose the first unified taxonomy for cross-modal large model security threats and formally define “agent security” as a novel dimension. Our analysis reveals critical gaps: incomplete evaluation coverage, non-scalable defenses, and unsustainable data practices. The work delivers the first comprehensive landscape of large model security across all modalities, synthesizing state-of-the-art defense strategies, benchmark datasets, and open challenges—establishing an authoritative roadmap toward evaluable, scalable, and sustainable large model security.

Technology Category

Application Category

📝 Abstract
The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This survey provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-based Agents. Our contributions are summarized as follows: (1) We present a comprehensive taxonomy of safety threats to these models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) We review defense strategies proposed for each type of attacks if available and summarize the commonly used datasets and benchmarks for safety research. (3) Building on this, we identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices. More importantly, we highlight the necessity of collective efforts from the research community and international collaboration. Our work can serve as a useful reference for researchers and practitioners, fostering the ongoing development of comprehensive defense systems and platforms to safeguard AI models.
Problem

Research questions and friction points this paper is trying to address.

Addressing safety risks in large AI models
Systematic review of threats and defenses
Identifying challenges for scalable safety solutions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic review of large model safety
Taxonomy of safety threats and defenses
Identification of open safety challenges
🔎 Similar Papers
Xingjun Ma
Xingjun Ma
Fudan University
Trustworthy AIMultimodal AIGenerative AIEmbodied AI
Y
Yifeng Gao
Fudan University
Y
Yixu Wang
Fudan University
R
Ruofan Wang
Fudan University
X
Xin Wang
Fudan University
Y
Ye Sun
Fudan University
Y
Yifan Ding
Fudan University
Hengyuan Xu
Hengyuan Xu
Fudan University
Trustworthy AI
Yunhao Chen
Yunhao Chen
Fudan University
AudioDiffusion ModelsMemorization
Y
Yunhan Zhao
Fudan University
Hanxun Huang
Hanxun Huang
The University of Melbourne
Trustworthy AIAI SafetyGenerative AICyber Security
Yige Li
Yige Li
Singapore Management University
Trustworthy Machine Learning
J
Jiaming Zhang
Hong Kong University of Science and Technology
Xiang Zheng
Xiang Zheng
Department of Computer Science, City University of Hong Kong
Reinforcement LearningTrustworthy AIEmbodied AI
Y
Yang Bai
ByteDance
Henghui Ding
Henghui Ding
Fudan University
Computer VisionMachine LearningSegmentationAIGC
Zuxuan Wu
Zuxuan Wu
Fudan University
X
Xipeng Qiu
Fudan University
J
Jingfeng Zhang
University of Auckland, RIKEN
Y
Yiming Li
Nanyang Technological University
J
Jun Sun
Singapore Management University
C
Cong Wang
City University of Hong Kong
Jindong Gu
Jindong Gu
Google Research & DeepMind, University of Oxford
Trustworthy AIAI SafetyMultimodal AI
Baoyuan Wu
Baoyuan Wu
Associate Professor, CUHK-SZ
AI Security and PrivacyMachine LearningComputer VisionOptimization
Siheng Chen
Siheng Chen
Shanghai Jiao Tong University
Collective intelligenceLLM agentgraph signal processingcollaborative perception
T
Tianwei Zhang
Nanyang Technological University
Y
Yang Liu
Nanyang Technological University
Mingming Gong
Mingming Gong
University of Melbourne & Mohamed bin Zayed University of Artificial Intelligence
Causal InferenceMachine LearningComputer Vision
Tongliang Liu
Tongliang Liu
Director, Sydney AI Centre, University of Sydney & Mohamed bin Zayed University of AI
Machine LearningLearning with Noisy LabelsTrustworthy Machine Learning
Shirui Pan
Shirui Pan
Professor, ARC Future Fellow, FQA, Director of TrustAGI Lab, Griffith University
Data MiningMachine LearningGraph Neural NetworksTrustworthy AITime Series
Cihang Xie
Cihang Xie
Assistant Professor, University of California, Santa Cruz
Computer VisionMachine Learning
T
Tianyu Pang
Sea AI Lab
Yinpeng Dong
Yinpeng Dong
Tsinghua University
Machine LearningDeep LearningAI Safety
Ruoxi Jia
Ruoxi Jia
Assistant Professor, Virginia Tech
Machine LearningPrivacySecurityData Economy
Y
Yang Zhang
CISPA Helmholtz Center for Information Security
Shiqing Ma
Shiqing Ma
University of Massachusetts, Amherst
SecurityAISE
X
Xiangyu Zhang
Purdue University
N
Neil Gong
Duke University
Chaowei Xiao
Chaowei Xiao
University of Wisconsin - Madison/NVIDIA
Trustworthy Machine LearningAdversarial Machine LearningAI SafetyRobust AISecurity
S
Sarah Erfani
The University of Melbourne
B
Bo Li
University of Illinois Urbana-Champaign
Masashi Sugiyama
Masashi Sugiyama
Director, RIKEN Center for Advanced Intelligence Project / Professor, The University of Tokyo
Machine LearningData MiningArtificial Intelligence
Dacheng Tao
Dacheng Tao
Nanyang Technological University
artificial intelligencemachine learningcomputer visionimage processingdata mining
James Bailey
James Bailey
Professor, School of Computing and Information Systems, University of Melbourne
machine learningartificial intelligencedata mining
Yu-Gang Jiang
Yu-Gang Jiang
Professor, Fudan University. IEEE & IAPR Fellow
Video AnalysisEmbodied AITrustworthy AI