Security of LLM-generated Code: A Comparative Analysis

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This study systematically evaluates the security risks introduced by large language models when generating code in realistic software development scenarios. By simulating typical developer usage patterns, the authors conduct an empirical analysis of code produced by seven widely used models, combining automated vulnerability detection tools with manual verification. Within a setting closely mirroring real-world development workflows, this work presents the first cross-model comparative assessment of code-generation security. The findings reveal that all evaluated models produce code containing security vulnerabilities, many of which are classified as high or critical severity, thereby exposing significant safety concerns inherent in current large language models when applied to code generation tasks.

📝 Abstract

The majority of software developers use or are planning to use Artificial Intelligence (AI) tools in their development processes. Their top reasons include improving productivity and faster learning. In fact, Large Language Model (LLM)-generated code is currently in production, including in major tech companies. However, concerns were raised about the risks associated with the use of AI tools to generate code. In this paper, we focus our attention on the risks to software security. We empirically evaluate the security of code generated by seven popular LLMs. We build upon previous work to mimic the behaviours of developers when using LLMs to generate code. Our results show that all seven LLMs that we have evaluated generate code that contains vulnerabilities, the majority of which are of critical or high severity.

Problem

Research questions and friction points this paper is trying to address.

LLM-generated code

software security

code vulnerabilities

AI tools

security risks

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-generated code

software security

empirical evaluation