Logic Meets Magic: LLMs Cracking Smart Contract Vulnerabilities

📅 2025-01-13

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Large language models (LLMs) exhibit high false-positive rates and poor version adaptability in detecting vulnerabilities in Solidity v0.8 smart contracts. Method: This work presents the first systematic security evaluation of LLMs specifically for Solidity v0.8, leveraging a cross-version comparative experimental framework integrating five state-of-the-art LLMs—including GPT-4, Claude-3, and Qwen2—to assess performance degradation. Results reveal a sharp drop in recall for certain vulnerability types to as low as 13% under v0.8, exposing LLMs’ overreliance on outdated libraries and historical framework changes. We propose targeted prompt engineering techniques that reduce false positives by over 60%. Contributions: (1) First empirical identification of LLM performance degradation in Solidity v0.8; (2) The inaugural benchmark evaluation results for v0.8; and (3) A methodological foundation and empirical warning for trustworthy, AI-augmented smart contract auditing.

Technology Category

Application Category

📝 Abstract

Smart contract vulnerabilities caused significant economic losses in blockchain applications. Large Language Models (LLMs) provide new possibilities for addressing this time-consuming task. However, state-of-the-art LLM-based detection solutions are often plagued by high false-positive rates. In this paper, we push the boundaries of existing research in two key ways. First, our evaluation is based on Solidity v0.8, offering the most up-to-date insights compared to prior studies that focus on older versions (v0.4). Second, we leverage the latest five LLM models (across companies), ensuring comprehensive coverage across the most advanced capabilities in the field. We conducted a series of rigorous evaluations. Our experiments demonstrate that a well-designed prompt can reduce the false-positive rate by over 60%. Surprisingly, we also discovered that the recall rate for detecting some specific vulnerabilities in Solidity v0.8 has dropped to just 13% compared to earlier versions (i.e., v0.4). Further analysis reveals the root cause of this decline: the reliance of LLMs on identifying changes in newly introduced libraries and frameworks during detection.

Problem

Research questions and friction points this paper is trying to address.

Blockchain

Smart Contracts

Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Solidity v0.8

Large Language Models

Optimized Directive Strategies

🔎 Similar Papers

No similar papers found.