🤖 AI Summary
Large language models (LLMs) frequently generate code containing security vulnerabilities, yet no standardized benchmark exists for systematically evaluating their code security. Method: We propose SecCodeBench—the first benchmark framework dedicated to LLM code security assessment—by formally defining and quantifying multiple vulnerability categories in LLM-generated code. It introduces a dual-verification evaluation paradigm integrating static application security testing (SAST) with an LLM-based discriminator, and employs multi-dimensional vulnerability injection and scenario-driven test cases to enable automated, full-stack development coverage. Contribution/Results: Empirical evaluation reveals significant deficiencies among mainstream LLMs in generating secure, vulnerability-free code. SecCodeBench establishes a reproducible, extensible, and rigorous security evaluation standard, providing actionable insights for developing and optimizing security-aware code-generation models.
📝 Abstract
The code generation capabilities of large language models(LLMs) have emerged as a critical dimension in evaluating their overall performance. However, prior research has largely overlooked the security risks inherent in the generated code. In this work, we introduce enchmark, a benchmark specifically designed to assess the security of LLM-generated code. The dataset encompasses a wide range of common software development scenarios and vulnerability types. Building upon this benchmark, we develop an automatic evaluation framework that leverages both static application security testing(SAST) and LLM-based judging to assess the presence of security vulnerabilities in model-generated code. Through the empirical evaluation of state-of-the-art LLMs on enchmark, we reveal notable deficiencies in their ability to produce vulnerability-free code. Our findings highlight pressing challenges and offer actionable insights for future advancements in the secure code generation performance of LLMs. The data and code will be released soon.