Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis

📅 2025-02-03

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This study identifies systemic deficiencies in large language models (LLMs) regarding secure code generation: mainstream models employ secure APIs in only 32% of Java 17 outputs and generate code invoking deprecated, unsafe functions in 78% of C++ cases. To address this, we introduce the first cross-language (Java 17, C++, etc.), security-oriented benchmark comprising 200 tasks, enabling coordinated evaluation across multiple models (GPT, Claude, CodeLlama). Our methodology integrates static vulnerability scanning with compliance analysis against industry best practices. We propose a novel evaluation paradigm for LLM security capabilities—explicitly accounting for compiler and toolchain evolution—and empirically demonstrate that lagging language ecosystem updates constitute a critical bottleneck in current LLM-generated code security. These findings provide empirical evidence and methodological foundations for enhancing LLM-based secure code generation and redesigning security evaluation frameworks.

Technology Category

Application Category

📝 Abstract

Artificial Intelligence (AI)-driven code generation tools are increasingly used throughout the software development lifecycle to accelerate coding tasks. However, the security of AI-generated code using Large Language Models (LLMs) remains underexplored, with studies revealing various risks and weaknesses. This paper analyzes the security of code generated by LLMs across different programming languages. We introduce a dataset of 200 tasks grouped into six categories to evaluate the performance of LLMs in generating secure and maintainable code. Our research shows that while LLMs can automate code creation, their security effectiveness varies by language. Many models fail to utilize modern security features in recent compiler and toolkit updates, such as Java 17. Moreover, outdated methods are still commonly used, particularly in C++. This highlights the need for advancing LLMs to enhance security and quality while incorporating emerging best practices in programming languages.

Problem

Research questions and friction points this paper is trying to address.

Analyze security of LLM-generated code across programming languages

Evaluate LLM performance in generating secure and maintainable code

Identify gaps in utilizing modern security features in LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-language security analysis

LLM-generated code evaluation

Modern security feature integration

🔎 Similar Papers

How Well Do Large Language Models Serve as End-to-End Secure Code Producers?