SC-Bench: A Large-Scale Dataset for Smart Contract Auditing

📅 2024-10-08
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Automated smart contract auditing lacks systematic machine learning methodologies and high-quality, standardized benchmark datasets. Method: We introduce SC-Bench—the first large-scale, standardized benchmark for smart contract auditing—comprising 5,377 real-world Ethereum contracts and 15,975 ERC-compliance violation samples, including 139 real-world vulnerabilities and systematically injected faults. It features fine-grained annotations (violation rules + precise code locations). We propose an oracle-guided large language model (LLM) evaluation paradigm, integrating ERC specification modeling, GPT-4 prompt engineering, human validation, and controlled fault injection for dataset construction. Contribution/Results: Experiments show that unsupervised LLMs achieve only 0.9% violation detection accuracy; incorporating structured oracle supervision improves performance to 22.9%, empirically demonstrating the critical role of supervised signals in AI-driven smart contract auditing.

Technology Category

Application Category

📝 Abstract
There is a huge demand to ensure the compliance of smart contracts listed on blockchain platforms to safety and economic standards. Today, manual efforts in the form of auditing are commonly used to achieve this goal. ML-based automated techniques have the promise to alleviate human efforts and the resulting monetary costs. However, unlike other domains where ML techniques have had huge successes, no systematic ML techniques have been proposed or applied to smart contract auditing. We present SC-Bench, the first dataset for automated smart-contract auditing research. SC-Bench consists of 5,377 real-world smart contracts running on Ethereum, a widely used blockchain platform, and 15,975 violations of standards on Ehereum called ERCs. Out of these violations, 139 are real violations programmers made. The remaining are errors we systematically injected to reflect the violations of different ERC rules. We evaluate SC-Bench using GPT-4 by prompting it with both the contracts and ERC rules. In addition, we manually identify each violated rule and the corresponding code site (i.e., oracle) and prompt GPT-4 with the information asking for a True-or-False question. Our results show that without the oracle, GPT-4 can only detect 0.9% violations, and with the oracle, it detects 22.9% violations. These results show the potential room for improvement in ML-based techniques for smart-contract auditing.
Problem

Research questions and friction points this paper is trying to address.

Ensuring smart contract compliance with safety standards.
Developing ML-based techniques for automated contract auditing.
Creating a dataset for smart-contract auditing research.
Innovation

Methods, ideas, or system contributions that make the work stand out.

ML-based automated contract auditing
SC-Bench dataset for Ethereum
GPT-4 evaluation with oracles
🔎 Similar Papers
No similar papers found.
S
Shihao Xia
Pennsylvania State University
M
Mengting He
Pennsylvania State University
Linhai Song
Linhai Song
Professor, Institute of Computing Technology, Chinese Academy of Sciences
Operating SystemsSoftware EngineeringSecurityProgramming Languages
Y
Yiying Zhang
University of California, San Diego