LLM-CSEC: Empirical Evaluation of Security in C/C++ Code Generated by Large Language Models

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This study systematically evaluates the security of C/C++ code generated by mainstream large language models (LLMs), focusing on prevalent high-severity vulnerabilities and deficiencies in defensive programming. Methodologically, it introduces the first systematic mapping of security weaknesses in LLM-generated code to the Common Weakness Enumeration (CWE) taxonomy and cross-references identified flaws with the Common Vulnerabilities and Exposures (CVE) database to assess severity; static analysis tools are employed to conduct automated, reproducible empirical comparisons across outputs from ten leading LLMs. Key contributions include: (1) establishing the first security evaluation framework specifically designed for AI-generated C/C++ code; (2) revealing widespread critical vulnerabilities—including buffer overflows and null pointer dereferences—across current LLMs; and (3) empirically confirming that the absence of input validation, boundary checking, and other defensive constructs is a primary root cause. Findings underscore the urgent need for mandatory human or automated review of LLM-generated code and provide foundational evidence for developing robust automated protection mechanisms.

Technology Category

Application Category

📝 Abstract

The security of code generated by large language models (LLMs) is a significant concern, as studies indicate that such code often contains vulnerabilities and lacks essential defensive programming constructs. This work focuses on examining and evaluating the security of LLM-generated code, particularly in the context of C/C++. We categorized known vulnerabilities using the Common Weakness Enumeration (CWE) and, to study their criticality, mapped them to CVEs. We used ten different LLMs for code generation and analyzed the outputs through static analysis. The amount of CWEs present in AI-generated code is concerning. Our findings highlight the need for developers to be cautious when using LLM-generated code. This study provides valuable insights to advance automated code generation and encourage further research in this domain.

Problem

Research questions and friction points this paper is trying to address.

Evaluating security vulnerabilities in LLM-generated C/C++ code

Mapping Common Weakness Enumeration flaws to critical CVEs

Analyzing ten LLMs' code outputs through static security assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated C/C++ code security using CWE categorization

Mapped vulnerabilities to CVEs for criticality assessment

Applied static analysis on ten LLMs' generated code

🔎 Similar Papers

How Well Do Large Language Models Serve as End-to-End Secure Code Producers?