🤖 AI Summary
This study systematically evaluates the security risks introduced by large language models when generating code in realistic software development scenarios. By simulating typical developer usage patterns, the authors conduct an empirical analysis of code produced by seven widely used models, combining automated vulnerability detection tools with manual verification. Within a setting closely mirroring real-world development workflows, this work presents the first cross-model comparative assessment of code-generation security. The findings reveal that all evaluated models produce code containing security vulnerabilities, many of which are classified as high or critical severity, thereby exposing significant safety concerns inherent in current large language models when applied to code generation tasks.
📝 Abstract
The majority of software developers use or are planning to use Artificial Intelligence (AI) tools in their development processes. Their top reasons include improving productivity and faster learning. In fact, Large Language Model (LLM)-generated code is currently in production, including in major tech companies. However, concerns were raised about the risks associated with the use of AI tools to generate code. In this paper, we focus our attention on the risks to software security. We empirically evaluate the security of code generated by seven popular LLMs. We build upon previous work to mimic the behaviours of developers when using LLMs to generate code. Our results show that all seven LLMs that we have evaluated generate code that contains vulnerabilities, the majority of which are of critical or high severity.