SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models (LLMs) frequently generate code containing security vulnerabilities, yet no standardized benchmark exists for systematically evaluating their code security. Method: We propose SecCodeBench—the first benchmark framework dedicated to LLM code security assessment—by formally defining and quantifying multiple vulnerability categories in LLM-generated code. It introduces a dual-verification evaluation paradigm integrating static application security testing (SAST) with an LLM-based discriminator, and employs multi-dimensional vulnerability injection and scenario-driven test cases to enable automated, full-stack development coverage. Contribution/Results: Empirical evaluation reveals significant deficiencies among mainstream LLMs in generating secure, vulnerability-free code. SecCodeBench establishes a reproducible, extensible, and rigorous security evaluation standard, providing actionable insights for developing and optimizing security-aware code-generation models.

Technology Category

Application Category

📝 Abstract

The code generation capabilities of large language models(LLMs) have emerged as a critical dimension in evaluating their overall performance. However, prior research has largely overlooked the security risks inherent in the generated code. In this work, we introduce enchmark, a benchmark specifically designed to assess the security of LLM-generated code. The dataset encompasses a wide range of common software development scenarios and vulnerability types. Building upon this benchmark, we develop an automatic evaluation framework that leverages both static application security testing(SAST) and LLM-based judging to assess the presence of security vulnerabilities in model-generated code. Through the empirical evaluation of state-of-the-art LLMs on enchmark, we reveal notable deficiencies in their ability to produce vulnerability-free code. Our findings highlight pressing challenges and offer actionable insights for future advancements in the secure code generation performance of LLMs. The data and code will be released soon.

Problem

Research questions and friction points this paper is trying to address.

Assessing security risks in LLM-generated code

Detecting vulnerabilities across diverse software scenarios

Evaluating LLMs' ability to produce secure code

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark for security in LLM-generated code

Combines SAST and LLM-based vulnerability detection

Evaluates models on diverse vulnerability types

🔎 Similar Papers

No similar papers found.

Authors to Follow