Security Weaknesses of Copilot-Generated Code in GitHub Projects: An Empirical Study

📅 2023-10-03

📈 Citations: 1

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This study empirically investigates security risks in AI programming assistants—GitHub Copilot, Amazon CodeWhisperer, and Codeium—focusing on insufficient randomness, cross-site scripting (XSS), and control-flow vulnerabilities in Python and JavaScript code. Method: We conduct the first large-scale, open-source empirical assessment of AI-generated code security flaws, covering all 43 CWE categories. Our methodology integrates static analysis, CWE-based classification, repository mining, and manual validation. Contribution/Results: We identify 733 AI-generated code snippets; 29.5% (Python) and 24.2% (JavaScript) contain security vulnerabilities, including eight instances ranked in the CWE Top 25. We further evaluate Copilot Chat’s vulnerability remediation capability—the first such assessment—and observe a maximum repair success rate of 55.5%. This work establishes the first systematic, empirically grounded benchmark for security evaluation and repair efficacy of AI coding tools, providing actionable evidence to inform security governance and tool improvement.

📝 Abstract

Modern code generation tools utilizing AI models like Large Language Models (LLMs) have gained increased popularity due to their ability to produce functional code. However, their usage presents security challenges, often resulting in insecure code merging into the code base. Thus, evaluating the quality of generated code, especially its security, is crucial. While prior research explored various aspects of code generation, the focus on security has been limited, mostly examining code produced in controlled environments rather than open source development scenarios. To address this gap, we conducted an empirical study, analyzing code snippets generated by GitHub Copilot and two other AI code generation tools (i.e., CodeWhisperer and Codeium) from GitHub projects. Our analysis identified 733 snippets, revealing a high likelihood of security weaknesses, with 29.5% of Python and 24.2% of JavaScript snippets affected. These issues span 43 Common Weakness Enumeration (CWE) categories, including significant ones like CWE-330: Use of Insufficiently Random Values, CWE-94: Improper Control of Generation of Code, and CWE-79: Cross-site Scripting. Notably, eight of those CWEs are among the 2023 CWE Top-25, highlighting their severity. We further examined using Copilot Chat to fix security issues in Copilot-generated code by providing Copilot Chat with warning messages from the static analysis tools, and up to 55.5% of the security issues can be fixed. We finally provide the suggestions for mitigating security issues in generated code.

Problem

Research questions and friction points this paper is trying to address.

AI-generated code

security vulnerabilities

GitHub Copilot

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI Code Generation

Security Vulnerabilities

Copilot Chat Remediation

🔎 Similar Papers

No similar papers found.

Uber

For New York, NY-based roles: The base salary range for this role is USD$202,000 per year - USD$224,000 per year. For San Francisco, CA-based roles: The base salary range for this role is USD$202,000 per year - USD$224,000 per year. For Seattle, WA-based roles: The base salary range for this role is USD$202,000 per year - USD$224,000 per year. For Sunnyvale, CA-based roles: The base salary range for this role is USD$202,000 per year - USD$224,000 per year.

New York, NY, USA / San Francisco, CA, USA / Seattle, WA, USA

Authors to Follow