Evaluating Pre-Trained Models for Multi-Language Vulnerability Patching

πŸ“… 2025-01-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing studies lack systematic evaluation of pre-trained code models for cross-language vulnerability detection and repair across diverse programming languages (e.g., C, Java, Python), particularly regarding complex vulnerability identification, long-context modeling, and patch quality. Method: We propose the first benchmarking framework tailored to multilingual security vulnerability repair, supporting both zero-shot and fine-tuned settings. It quantitatively evaluates accuracy, inference latency, and the impact of generated patch length. Contribution/Results: Our evaluation of CodeBERT and CodeT5 reveals that CodeT5 achieves higher accuracy on complex vulnerabilities and lower inference latency, whereas CodeBERT demonstrates greater robustness under context-length constraints. Crucially, both models exhibit significant performance degradation when patch length exceeds 20 tokensβ€”a previously unreported phenomenon linking patch verbosity to model failure. These findings provide empirical guidance for model selection and architectural improvement in security-critical applications.

Technology Category

Application Category

πŸ“ Abstract
Software vulnerabilities pose critical security risks, demanding prompt and effective mitigation strategies. While advancements in Automated Program Repair (APR) have primarily targeted general software bugs, the domain of vulnerability patching, which is a security-critical subset of APR, remains underexplored. This paper investigates the potential of pre-trained language models, CodeBERT and CodeT5, for automated vulnerability patching across diverse datasets and five programming languages. We evaluate these models on their accuracy, computational efficiency, and how the length of vulnerable code patches impacts performance. Our findings reveal promising accuracy levels, particularly for CodeT5 on datasets with complex vulnerability patterns, while CodeBERT demonstrates strengths in handling fragmented or context-limited datasets. CodeT5 further showcases superior efficiency, making it well-suited for large-scale applications. However, both models face challenges in maintaining performance as patch length increases, highlighting the complexity of addressing extended in program repair specifically aimed at fixing vulnerabilities. This study benchmarks model performance, highlights key limitations, and offers insights to improve automated vulnerability patching for practical security applications.
Problem

Research questions and friction points this paper is trying to address.

Pre-trained Language Models
Security Vulnerability Detection
Code Repair
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-trained Language Models
Automated Program Repair
Multi-language Environment
Zanis Ali Khan
Zanis Ali Khan
Luxembourg Institute of Science and Technology (LIST)
Log ParsingAnomaly DetectionLLMsVulnerability Detection and Patching
A
Aayush Garg
Luxembourg Institute of Science and Technology (LIST), Luxembourg
Yuejun Guo
Yuejun Guo
Luxembourg Institute of Science and Technology
Machine LearningCybersecurity
Q
Qiang Tang
Luxembourg Institute of Science and Technology (LIST), Luxembourg