Safety at Scale: A Comprehensive Survey of Large Model Safety

📅 2025-02-02

📈 Citations: 0

✨ Influential: 0

career value

251K/year

🤖 AI Summary

Large multimodal models—including LLMs, VLMs, VFMs, diffusion models, and AI agents—face diverse security threats such as adversarial attacks, data poisoning, prompt injection, jailbreaking, backdoors, model/data extraction, and agent-specific vulnerabilities. Method: We conduct a systematic survey grounded in literature analysis, threat modeling, and cross-domain comparison, integrating perspectives from robustness, prompt engineering, privacy auditing, and benchmarking. Contribution/Results: We propose the first unified taxonomy for cross-modal large model security threats and formally define “agent security” as a novel dimension. Our analysis reveals critical gaps: incomplete evaluation coverage, non-scalable defenses, and unsustainable data practices. The work delivers the first comprehensive landscape of large model security across all modalities, synthesizing state-of-the-art defense strategies, benchmark datasets, and open challenges—establishing an authoritative roadmap toward evaluable, scalable, and sustainable large model security.

Technology Category

Application Category

📝 Abstract

The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This survey provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-based Agents. Our contributions are summarized as follows: (1) We present a comprehensive taxonomy of safety threats to these models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) We review defense strategies proposed for each type of attacks if available and summarize the commonly used datasets and benchmarks for safety research. (3) Building on this, we identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices. More importantly, we highlight the necessity of collective efforts from the research community and international collaboration. Our work can serve as a useful reference for researchers and practitioners, fostering the ongoing development of comprehensive defense systems and platforms to safeguard AI models.

Problem

Research questions and friction points this paper is trying to address.

Addressing safety risks in large AI models

Systematic review of threats and defenses

Identifying challenges for scalable safety solutions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic review of large model safety

Taxonomy of safety threats and defenses

Identification of open safety challenges

🔎 Similar Papers

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?