LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions

๐Ÿ“… 2024-08-14
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 9
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing approaches for identifying vulnerable versions in open-source C/C++ software suffer from low precision due to their neglect of vulnerability-irrelevant code and insufficient capability in syntax-level code clone detection. Method: This paper proposes an LLM-driven, semantics-aware vulnerability localization framework. It synergistically applies program slicing and large language models to extract vulnerability-relevant code from patches, then leverages semantic-level code clone detection to compare against historical commitsโ€”enabling automated backtracking of the Vulnerability-Introducing Commit (VIC) and precise identification of affected versions. A novel LLM-guided vulnerability context modeling mechanism is introduced, overcoming limitations of conventional syntactic matching. Results: Evaluated on a dataset of 74 vulnerabilities across 1,013 software versions, the framework achieves an F1-score of 92.4%, significantly outperforming state-of-the-art methods; it also corrects 134 mislabeled vulnerable versions in the NVD database.

Technology Category

Application Category

๐Ÿ“ Abstract
Open-source software (OSS) has experienced a surge in popularity, attributed to its collaborative development model and cost-effective nature. However, the adoption of specific software versions in development projects may introduce security risks when these versions bring along vulnerabilities. Current methods of identifying vulnerable versions typically analyze and trace the code involved in vulnerability patches using static analysis with pre-defined rules. They then use syntactic-level code clone detection to identify the vulnerable versions. These methods are hindered by imprecisions due to (1) the inclusion of vulnerability-irrelevant code in the analysis and (2) the inadequacy of syntactic-level code clone detection. This paper presents Vercation, an approach designed to identify vulnerable versions of OSS written in C/C++. Vercation combines program slicing with a Large Language Model (LLM) to identify vulnerability-relevant code from vulnerability patches. It then backtraces historical commits to gather previous modifications of identified vulnerability-relevant code. We propose semantic-level code clone detection to compare the differences between pre-modification and post-modification code, thereby locating the vulnerability-introducing commit (vic) and enabling to identify the vulnerable versions between the patch commit and the vic. We curate a dataset linking 74 OSS vulnerabilities and 1013 versions to evaluate Vercation. On this dataset, our approach achieves the F1 score of 92.4%, outperforming current state-of-the-art methods. More importantly, Vercation detected 134 incorrect vulnerable OSS versions in NVD reports.
Problem

Research questions and friction points this paper is trying to address.

Identifying vulnerable OSS versions using static analysis and LLM
Improving precision in detecting vulnerability-relevant code changes
Correcting inaccurate vulnerable versions in existing security reports
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines program slicing with LLM
Backtracks historical commits for modifications
Uses expanded normalized ASTs for detection
๐Ÿ”Ž Similar Papers
Yiran Cheng
Yiran Cheng
Chinese Academy of Sciences University
L
Lwin Khin Shar
Singapore Management University, Singapore
T
Ting Zhang
Singapore Management University, Singapore
S
Shouguo Yang
Zhongguancun Laboratory, China
Chaopeng Dong
Chaopeng Dong
Institute of Information Engineering, Chinese Academy of Sciences
Software Supply Chain SecurityIoT Security
D
David Lo
Singapore Management University, Singapore
S
Shichao Lv
Beijing Key Laboratory of IOT Information Security Technology, Institute of Information Engineering, CAS; School of Cyber Security, University of Chinese Academy of Sciences, China
Z
Zhiqiang Shi
Beijing Key Laboratory of IOT Information Security Technology, Institute of Information Engineering, CAS; School of Cyber Security, University of Chinese Academy of Sciences, China
L
Limin Sun
Beijing Key Laboratory of IOT Information Security Technology, Institute of Information Engineering, CAS; School of Cyber Security, University of Chinese Academy of Sciences, China