A systematic literature Review for Transformer-based Software Vulnerability detection

📅 2026-04-27

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This study addresses the absence of a systematic survey on Transformer-based models for software vulnerability detection, which has hindered a clear understanding of the current research landscape and key challenges. Following Kitchenham’s guidelines for systematic literature reviews, the authors comprehensively analyze 80 relevant studies published between 2021 and 2025, offering the first systematic classification of encoder-only, decoder-only, and hybrid Transformer architectures as applied to source code, system logs, and smart contracts. The work clarifies prevailing trends, commonly used datasets, and baseline models, while identifying critical challenges—including data imbalance, scalability, cross-language generalization, and model interpretability. These insights provide a consolidated foundation for developing more reliable, accurate, and interpretable vulnerability detection systems.

📝 Abstract

Context: Software vulnerabilities pose significant security threats to software systems, especially as software is increasingly used across many areas of daily life, including health, government, and finance. Recently, transformer-based models have demonstrated promising results in automatic software vulnerability identification due to their robust contextual modelling and representation learning capabilities. Objectives: While numerous systematic literature reviews (SLRs) have examined machine learning and deep learning methods for identifying vulnerabilities, a more transformer-centric analysis remains to be explored. This SLR critically analysed 80 studies published between 2021 and 2025 that utilised transformer models to identify software vulnerabilities. Methods: Using Kitchenhams SLR guidelines, we methodically evaluate current research from various perspectives, encompassing study trends, datasets and sources, programming languages, transformer frameworks, detection detail levels, assessment metrics, reference models, types of vulnerabilities, and experimental configurations. Results: We classify transformer models into encoder, decoder, and combined architectures and analyse both pre-trained and fine-tuned versions utilized on source code, logs, and smart contracts. The results emphasise prevailing research trends, frequently utilised benchmarks, and main baselines. It also uncovers crucial technical issues like data imbalance, interpretability, scalability, and generalization across programming languages. Conclusion: By integrating current evidence and recognising unaddressed research areas, this SLR provides a consolidated resource for researchers and professionals seeking to develop more reliable, precise, and interpretable transformer-based vulnerability identification systems.

Problem

Research questions and friction points this paper is trying to address.

Transformer

Software Vulnerability Detection

Systematic Literature Review

Code Representation

Security

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based models

Software vulnerability detection

Systematic Literature Review