π€ AI Summary
This study systematically addresses widespread path traversal vulnerabilities (CWE-22) on GitHub. We propose an end-to-end automated vulnerability governance pipeline integrating static analysis, dynamic exploit simulation, automated CVSS scoring, and GPT-4βdriven code-level patch generation, followed by responsible disclosure. Applied to large-scale open-source projects, it identified 1,756 high-severity instances (including multiple with CVSS β₯ 9.0), 14% of which have since been patched; notably, we uncover for the first time the potential contamination of mainstream LLM training data by homologous vulnerability patterns. Our contributions are threefold: (1) the first fully automated path traversal governance framework spanning detection, validation, repair, and disclosure; (2) empirical evidence of vulnerability pattern pollution in LLM training corpora; and (3) closed-loop validation and real-world deployment of AI-generated patches within the open-source ecosystem.
π Abstract
Vulnerabilities in open-source software can cause cascading effects in the modern digital ecosystem. It is especially worrying if these vulnerabilities repeat across many projects, as once the adversaries find one of them, they can scale up the attack very easily. Unfortunately, since developers frequently reuse code from their own or external code resources, some nearly identical vulnerabilities exist across many open-source projects. We conducted a study to examine the prevalence of a particular vulnerable code pattern that enables path traversal attacks (CWE-22) across open-source GitHub projects. To handle this study at the GitHub scale, we developed an automated pipeline that scans GitHub for the targeted vulnerable pattern, confirms the vulnerability by first running a static analysis and then exploiting the vulnerability in the context of the studied project, assesses its impact by calculating the CVSS score, generates a patch using GPT-4, and reports the vulnerability to the maintainers. Using our pipeline, we identified 1,756 vulnerable open-source projects, some of which are very influential. For many of the affected projects, the vulnerability is critical (CVSS score higher than 9.0), as it can be exploited remotely without any privileges and critically impact the confidentiality and availability of the system. We have responsibly disclosed the vulnerability to the maintainers, and 14% of the reported vulnerabilities have been remediated. We also investigated the root causes of the vulnerable code pattern and assessed the side effects of the large number of copies of this vulnerable pattern that seem to have poisoned several popular LLMs. Our study highlights the urgent need to help secure the open-source ecosystem by leveraging scalable automated vulnerability management solutions and raising awareness among developers.