Security of Language Models for Code: A Systematic Literature Review

📅 2024-10-21

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

🤖 AI Summary

The security research community lacks a systematic, rigorous survey on code large language models (CodeLMs). Method: We conduct the first PRISMA-guided systematic literature review covering 67 peer-reviewed papers, employing qualitative analysis and cross-domain mapping to synthesize empirical findings. Contribution/Results: We propose the first three-dimensional security taxonomy—threats, attacks, and defenses—specifically for CodeLMs. This framework systematically categorizes attack vectors (e.g., prompt injection, training data leakage) and defense strategies (e.g., input filtering, fine-tuning-based hardening). We further consolidate mainstream models, benchmark datasets, evaluation metrics, and 12 open-source toolkits. The resulting structured knowledge graph fills a critical gap in comprehensive CodeLM security overviews, clarifies unresolved challenges—including model-specific vulnerabilities and evaluation inconsistencies—and outlines concrete directions for future research and practice.

Technology Category

Application Category

📝 Abstract

Language models for code (CodeLMs) have emerged as powerful tools for code-related tasks, outperforming traditional methods and standard machine learning approaches. However, these models are susceptible to security vulnerabilities, drawing increasing research attention from domains such as software engineering, artificial intelligence, and cybersecurity. Despite the growing body of research focused on the security of CodeLMs, a comprehensive survey in this area remains absent. To address this gap, we systematically review 67 relevant papers, organizing them based on attack and defense strategies. Furthermore, we provide an overview of commonly used language models, datasets, and evaluation metrics, and highlight open-source tools and promising directions for future research in securing CodeLMs.

Problem

Research questions and friction points this paper is trying to address.

Investigates security vulnerabilities in CodeLMs

Reviews attack and defense strategies systematically

Identifies future research directions for secure CodeLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically review 67 papers on CodeLM security

Organize research by attack and defense strategies

Highlight open-source tools and future research directions

🔎 Similar Papers

How Well Do Large Language Models Serve as End-to-End Secure Code Producers?

2024-08-20arXiv.orgCitations: 2

Authors to Follow