Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey

📅 2024-06-12

🏛️ arXiv.org

📈 Citations: 22

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Large language models (LLMs) face unique security and privacy threats across their lifecycle—distinct from those of traditional language models—yet existing research lacks a scenario-aware, fine-grained threat taxonomy. Method: We systematically analyze five canonical LLM scenarios—pretraining, fine-tuning, retrieval-augmented generation (RAG), deployment, and LLM-based agents—and propose the first LLM-specific threat taxonomy that differentiates risk origins and semantics. Through rigorous threat modeling, cross-study comparative analysis, and synthesis of representative attacks (e.g., prompt injection, training data extraction, and agent jailbreaking), we construct a structured, scenario-spanning threat map and derive reusable defense guidelines. Contribution/Results: This work delivers the first unified analytical framework for LLM security, enabling rigorous theoretical advancement in academia and supporting standardized risk assessment methodologies in industry.

Technology Category

Application Category

📝 Abstract

With the rapid development of artificial intelligence, large language models (LLMs) have made remarkable advancements in natural language processing. These models are trained on vast datasets to exhibit powerful language understanding and generation capabilities across various applications, including machine translation, chatbots, and agents. However, LLMs have revealed a variety of privacy and security issues throughout their life cycle, drawing significant academic and industrial attention. Moreover, the risks faced by LLMs differ significantly from those encountered by traditional language models. Given that current surveys lack a clear taxonomy of unique threat models across diverse scenarios, we emphasize the unique privacy and security threats associated with five specific scenarios: pre-training, fine-tuning, retrieval-augmented generation systems, deployment, and LLM-based agents. Addressing the characteristics of each risk, this survey outlines potential threats and countermeasures. Research on attack and defense situations can offer feasible research directions, enabling more areas to benefit from LLMs.

Problem

Research questions and friction points this paper is trying to address.

Identifying unique security and privacy threats in LLM lifecycle stages

Analyzing distinct risks compared to traditional language models

Proposing countermeasures for threats in pre-training, fine-tuning, and deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive threat taxonomy across four scenarios

Analysis of unique LLM risks versus traditional models

Outline of countermeasures for each risk type

🔎 Similar Papers

Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices