FORGE: An LLM-driven Framework for Large-Scale Smart Contract Vulnerability Dataset Construction

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the inefficiency and error-proneness of manual annotation and the lack of standardized classification criteria in smart contract vulnerability dataset construction, this paper proposes the first large language model (LLM)-based automated framework for building vulnerability datasets. The framework integrates divide-and-conquer structured information extraction with Tree-of-Thoughts reasoning to ensure hierarchical, CWE-compliant, and consistent vulnerability classification. Applied to 6,454 real-world audit reports, it yields the largest high-quality dataset to date: comprising 81,390 Solidity files, 27,497 verified vulnerability instances, and coverage of 296 CWE types. The framework achieves 95.6% accuracy in vulnerability extraction and attains a Krippendorff’s α of 0.87 for classification consistency. This work establishes a scalable, reproducible, and standardized foundational resource for evaluating smart contract security tools and conducting empirical security research.

Technology Category

Application Category

📝 Abstract
High-quality smart contract vulnerability datasets are critical for evaluating security tools and advancing smart contract security research. Two major limitations of current manual dataset construction are (1) labor-intensive and error-prone annotation processes limiting the scale, quality, and evolution of the dataset, and (2) absence of standardized classification rules results in inconsistent vulnerability categories and labeling results across different datasets. To address these limitations, we present FORGE, the first automated approach for constructing smart contract vulnerability datasets. FORGE leverages an LLM-driven pipeline to extract high-quality vulnerabilities from real-world audit reports and classify them according to the CWE, the most widely recognized classification in software security. FORGE employs a divide-and-conquer strategy to extract structured and self-contained vulnerability information from these reports. Additionally, it uses a tree-of-thoughts technique to classify the vulnerability information into the hierarchical CWE classification. To evaluate FORGE's effectiveness, we run FORGE on 6,454 real-world audit reports and generate a dataset comprising 81,390 solidity files and 27,497 vulnerability findings across 296 CWE categories. Manual assessment of the dataset demonstrates high extraction precision and classification consistency with human experts (precision of 95.6% and inter-rater agreement k-$α$ of 0.87). We further validate the practicality of our dataset by benchmarking 13 existing security tools on our dataset. The results reveal the significant limitations in current detection capabilities. Furthermore, by analyzing the severity-frequency distribution patterns through a unified CWE perspective in our dataset, we highlight inconsistency between current smart contract research focus and priorities identified from real-world vulnerabilities...
Problem

Research questions and friction points this paper is trying to address.

Automates construction of smart contract vulnerability datasets
Standardizes vulnerability classification using CWE framework
Improves dataset scale, quality, and consistency via LLM pipeline
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven pipeline for vulnerability extraction
Divide-and-conquer strategy for structured data
Tree-of-thoughts technique for CWE classification
🔎 Similar Papers
No similar papers found.
Jiachi Chen
Jiachi Chen
Associate Professor, Sun Yat-Sen University
Smart ContractsBlockchainLarge Language ModelsSoftware SecuritySoftware Engineering
Yiming Shen
Yiming Shen
Sun Yat-sen University
Software EngineeringSmart ContractLLM
Jiashuo Zhang
Jiashuo Zhang
Peking University
Software EngeneeringLLM4SESmart Contract
Z
Zihao Li
The Hong Kong Polytechnic University, Hong Kong, China
J
John Grundy
Monash University, Melbourne, Australia
Z
Zhenzhe Shao
Sun Yat-sen University, Zhuhai, China
Y
Yanlin Wang
Sun Yat-sen University, Zhuhai, China
Jiashui Wang
Jiashui Wang
Zhejiang University
Software SecurityCyber SecurityLanguage AgentArtificial IntelligenceBusiness Security
T
Ting Chen
University of Electronic Science and Technology of China, Chengdu, China; Kashi Institute of Electronics and Information Industry, Kashi, China
Zibin Zheng
Zibin Zheng
IEEE Fellow, Highly Cited Researcher, Sun Yat-sen University, China
BlockchainSmart ContractServices ComputingSoftware Reliability