SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently generate code containing security vulnerabilities, yet no standardized benchmark exists for systematically evaluating their code security. Method: We propose SecCodeBench—the first benchmark framework dedicated to LLM code security assessment—by formally defining and quantifying multiple vulnerability categories in LLM-generated code. It introduces a dual-verification evaluation paradigm integrating static application security testing (SAST) with an LLM-based discriminator, and employs multi-dimensional vulnerability injection and scenario-driven test cases to enable automated, full-stack development coverage. Contribution/Results: Empirical evaluation reveals significant deficiencies among mainstream LLMs in generating secure, vulnerability-free code. SecCodeBench establishes a reproducible, extensible, and rigorous security evaluation standard, providing actionable insights for developing and optimizing security-aware code-generation models.

Technology Category

Application Category

📝 Abstract
The code generation capabilities of large language models(LLMs) have emerged as a critical dimension in evaluating their overall performance. However, prior research has largely overlooked the security risks inherent in the generated code. In this work, we introduce enchmark, a benchmark specifically designed to assess the security of LLM-generated code. The dataset encompasses a wide range of common software development scenarios and vulnerability types. Building upon this benchmark, we develop an automatic evaluation framework that leverages both static application security testing(SAST) and LLM-based judging to assess the presence of security vulnerabilities in model-generated code. Through the empirical evaluation of state-of-the-art LLMs on enchmark, we reveal notable deficiencies in their ability to produce vulnerability-free code. Our findings highlight pressing challenges and offer actionable insights for future advancements in the secure code generation performance of LLMs. The data and code will be released soon.
Problem

Research questions and friction points this paper is trying to address.

Assessing security risks in LLM-generated code
Detecting vulnerabilities across diverse software scenarios
Evaluating LLMs' ability to produce secure code
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark for security in LLM-generated code
Combines SAST and LLM-based vulnerability detection
Evaluates models on diverse vulnerability types
🔎 Similar Papers
No similar papers found.
Xinghang Li
Xinghang Li
Beijing Academy of Artificial Intelligence; Tsinghua University
Computer VisionRobot NavigationManipulation
J
Jingzhe Ding
ByteDance, Beijing, China
C
Chao Peng
ByteDance, Beijing, China
Bing Zhao
Bing Zhao
SRI International
Natural Language ProcessingMachine LearningOptimizations
X
Xiang Gao
ByteDance, Beijing, China
H
Hongwan Gao
ByteDance, Beijing, China
X
Xinchen Gu
ByteDance, Beijing, China