An Empirical Analysis of Vulnerability Detection Tools for Solidity Smart Contracts Using Line Level Manually Annotated Vulnerabilities

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the low accuracy and poor reliability of existing Solidity smart contract vulnerability detection tools. We systematically evaluate 20 mainstream tools on real-world contracts, leveraging the largest publicly available dataset of 2,182 line-level manually annotated vulnerabilities—covering all DASP TOP 10 vulnerability categories—and integrating SmartBugs 2.0 with a multi-tool ensemble analysis strategy. Our large-scale empirical study reveals that three optimal tool combinations detect 76.78% of real vulnerabilities within an average runtime of under one minute, substantially improving both detection rate and robustness. In contrast, current large language model (LLM)-based approaches exhibit high result variance across datasets, raising concerns about their reliability. Key contributions include: (1) releasing the first large-scale, line-level, human-annotated smart contract vulnerability dataset; and (2) proposing and validating a multi-tool fusion paradigm, establishing a reproducible and scalable methodology for smart contract security analysis.

Technology Category

Application Category

📝 Abstract
The rapid adoption of blockchain technology highlighted the importance of ensuring the security of smart contracts due to their critical role in automated business logic execution on blockchain platforms. This paper provides an empirical evaluation of automated vulnerability analysis tools specifically designed for Solidity smart contracts. Leveraging the extensive SmartBugs 2.0 framework, which includes 20 analysis tools, we conducted a comprehensive assessment using an annotated dataset of 2,182 instances we manually annotated with line-level vulnerability labels. Our evaluation highlights the detection effectiveness of these tools in detecting various types of vulnerabilities, as categorized by the DASP TOP 10 taxonomy. We evaluated the effectiveness of a Large Language Model-based detection method on two popular datasets. In this case, we obtained inconsistent results with the two datasets, showing unreliable detection when analyzing real-world smart contracts. Our study identifies significant variations in the accuracy and reliability of different tools and demonstrates the advantages of combining multiple detection methods to improve vulnerability identification. We identified a set of 3 tools that, combined, achieve up to 76.78% found vulnerabilities taking less than one minute to run, on average. This study contributes to the field by releasing the largest dataset of manually analyzed smart contracts with line-level vulnerability annotations and the empirical evaluation of the greatest number of tools to date.
Problem

Research questions and friction points this paper is trying to address.

Evaluating effectiveness of Solidity smart contract vulnerability detection tools
Assessing accuracy and reliability of automated vulnerability analysis methods
Identifying optimal tool combinations for improved vulnerability detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated 20 Solidity tools using SmartBugs 2.0 framework
Assessed line-level vulnerabilities in 2,182 annotated instances
Combined 3 tools for 76.78% detection under one minute
🔎 Similar Papers
No similar papers found.