ParaVul: A Parallel Large Language Model and Retrieval-Augmented Framework for Smart Contract Vulnerability Detection

📅 2025-10-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing smart contract vulnerability detection methods suffer from high false-positive rates and poor scalability, while large language models (LLMs) incur prohibitive inference costs. To address these challenges, this paper proposes ParaVul—a parallelized, retrieval-augmented, lightweight detection framework. Its key contributions are: (1) efficient LLM fine-tuning via Sparse Low-Rank Adaptation (SLoRA); (2) a hybrid Retrieval-Augmented Generation (RAG) system integrating dense retrieval with BM25; and (3) a meta-learning fusion model that jointly integrates LLM outputs and retrieved evidence, enhanced by chain-of-thought prompting for improved interpretability. Experiments demonstrate that ParaVul achieves F1 scores of 0.9398 (single-label) and 0.9330 (multi-label) vulnerability detection—substantially outperforming state-of-the-art approaches—while maintaining high accuracy, strong robustness, and low computational overhead.

Technology Category

Application Category

📝 Abstract
Smart contracts play a significant role in automating blockchain services. Nevertheless, vulnerabilities in smart contracts pose serious threats to blockchain security. Currently, traditional detection methods primarily rely on static analysis and formal verification, which can result in high false-positive rates and poor scalability. Large Language Models (LLMs) have recently made significant progress in smart contract vulnerability detection. However, they still face challenges such as high inference costs and substantial computational overhead. In this paper, we propose ParaVul, a parallel LLM and retrieval-augmented framework to improve the reliability and accuracy of smart contract vulnerability detection. Specifically, we first develop Sparse Low-Rank Adaptation (SLoRA) for LLM fine-tuning. SLoRA introduces sparsification by incorporating a sparse matrix into quantized LoRA-based LLMs, thereby reducing computational overhead and resource requirements while enhancing their ability to understand vulnerability-related issues. We then construct a vulnerability contract dataset and develop a hybrid Retrieval-Augmented Generation (RAG) system that integrates dense retrieval with Best Matching 25 (BM25), assisting in verifying the results generated by the LLM. Furthermore, we propose a meta-learning model to fuse the outputs of the RAG system and the LLM, thereby generating the final detection results. After completing vulnerability detection, we design chain-of-thought prompts to guide LLMs to generate comprehensive vulnerability detection reports. Simulation results demonstrate the superiority of ParaVul, especially in terms of F1 scores, achieving 0.9398 for single-label detection and 0.9330 for multi-label detection.
Problem

Research questions and friction points this paper is trying to address.

Detecting smart contract vulnerabilities with high accuracy
Reducing computational costs in LLM-based vulnerability detection
Improving reliability through hybrid retrieval-augmented verification systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

SLoRA fine-tuning reduces computational overhead for LLMs
Hybrid RAG system integrates dense retrieval with BM25
Meta-learning model fuses RAG and LLM outputs
🔎 Similar Papers
No similar papers found.
T
Tenghui Huang
School of Automation, Guangdong University of Technology, Guangzhou 510006, China
Jinbo Wen
Jinbo Wen
M.S. Student, Nanjing University of Aeronautics and Astronautics
GenAI+NetworkingContract TheoryMetaverseBlockchain
J
Jiawen Kang
School of Automation, Guangdong University of Technology, Guangzhou 510006, China
S
Siyong Chen
School of Automation, Guangdong University of Technology, Guangzhou 510006, China
Z
Zhengtao Li
School of Automation, Guangdong University of Technology, Guangzhou 510006, China
T
Tao Zhang
School of Cyberspace Science and Technology, Beijing Jiaotong University, Beijing 100044, China
D
Dongning Liu
School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China
Jiacheng Wang
Jiacheng Wang
Nanyang Technological University
ISACGenAILow-altitude wireless networkSemantic Communications
C
Chengjun Cai
Department of Computer Science, City University of Hong Kong, Hong Kong
Y
Yinqiu Liu
College of Computing and Data Science, Nanyang Technological University, Singapore
D
Dusit Niyato
College of Computing and Data Science, Nanyang Technological University, Singapore