🤖 AI Summary
The security research community lacks a systematic, rigorous survey on code large language models (CodeLMs). Method: We conduct the first PRISMA-guided systematic literature review covering 67 peer-reviewed papers, employing qualitative analysis and cross-domain mapping to synthesize empirical findings. Contribution/Results: We propose the first three-dimensional security taxonomy—threats, attacks, and defenses—specifically for CodeLMs. This framework systematically categorizes attack vectors (e.g., prompt injection, training data leakage) and defense strategies (e.g., input filtering, fine-tuning-based hardening). We further consolidate mainstream models, benchmark datasets, evaluation metrics, and 12 open-source toolkits. The resulting structured knowledge graph fills a critical gap in comprehensive CodeLM security overviews, clarifies unresolved challenges—including model-specific vulnerabilities and evaluation inconsistencies—and outlines concrete directions for future research and practice.
📝 Abstract
Language models for code (CodeLMs) have emerged as powerful tools for code-related tasks, outperforming traditional methods and standard machine learning approaches. However, these models are susceptible to security vulnerabilities, drawing increasing research attention from domains such as software engineering, artificial intelligence, and cybersecurity. Despite the growing body of research focused on the security of CodeLMs, a comprehensive survey in this area remains absent. To address this gap, we systematically review 67 relevant papers, organizing them based on attack and defense strategies. Furthermore, we provide an overview of commonly used language models, datasets, and evaluation metrics, and highlight open-source tools and promising directions for future research in securing CodeLMs.