I Know Who Clones Your Code: Interpretable Smart Contract Similarity Detection

📅 2025-09-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Open-source code reuse in smart contracts exacerbates cross-contract vulnerability propagation, yet existing similarity detection methods suffer from insufficient semantic granularity and poor interpretability: abstract syntax tree (AST)-based approaches struggle to model complex structural patterns, while deep learning methods often neglect syntactic constraints and lack transparency. To address these limitations, we propose a statement-level fine-grained similarity detection framework. It decomposes ASTs into sequences of statement trees, constructs a syntax-aware statement-level classifier, and introduces a cosine-diffusion hyperparameter search algorithm that jointly optimizes automation and decision interpretability. Evaluated on three real-world datasets, our method achieves an average F1-score of 95.88%, outperforming the state-of-the-art by 14.01%. It significantly enhances both vulnerability clone identification accuracy and detection robustness.

Technology Category

Application Category

📝 Abstract
Widespread reuse of open-source code in smart contract development boosts programming efficiency but significantly amplifies bug propagation across contracts, while dedicated methods for detecting similar smart contract functions remain very limited. Conventional abstract-syntax-tree (AST) based methods for smart contract similarity detection face challenges in handling intricate tree structures, which impedes detailed semantic comparison of code. Recent deep-learning based approaches tend to overlook code syntax and detection interpretability, resulting in suboptimal performance. To fill this research gap, we introduce SmartDetector, a novel approach for computing similarity between smart contract functions, explainable at the fine-grained statement level. Technically, SmartDetector decomposes the AST of a smart contract function into a series of smaller statement trees, each reflecting a structural element of the source code. Then, SmartDetector uses a classifier to compute the similarity score of two functions by comparing each pair of their statement trees. To address the infinite hyperparameter space of the classifier, we mathematically derive a cosine-wise diffusion process to efficiently search optimal hyperparameters. Extensive experiments conducted on three large real-world datasets demonstrate that SmartDetector outperforms current state-of-the-art methods by an average improvement of 14.01% in F1-score, achieving an overall average F1-score of 95.88%.
Problem

Research questions and friction points this paper is trying to address.

Detecting similar smart contract functions to prevent bug propagation
Overcoming AST-based method limitations in semantic code comparison
Addressing interpretability and syntax neglect in deep learning approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes AST into statement trees for comparison
Uses cosine-wise diffusion for hyperparameter optimization
Achieves interpretable similarity detection at statement level
🔎 Similar Papers
No similar papers found.
Zhenguang Liu
Zhenguang Liu
Zhejiang University
BlockchainSmart Contract SecurityMultimedia
L
Lixun Ma
Zhejiang University (The State Key Laboratory of Blockchain and Data Security, Zhejiang University)
Z
Zhongzheng Mu
School of Computer and Information Engineering, Zhejiang Gongshang University, China
Chengkun Wei
Chengkun Wei
Zhejiang University
Network SystemData PrivacyMachine Learning Security
Xiaojun Xu
Xiaojun Xu
School of Computer and Information Engineering, Zhejiang Gongshang University, China
Y
Yingying Jiao
Zhejiang University of Technology, Hangzhou 310023, China
Kui Ren
Kui Ren
Professor and Dean of Computer Science, Zhejiang University, ACM/IEEE Fellow
Data Security & PrivacyAI SecurityIoT & Vehicular Security