I Know Who Clones Your Code: Interpretable Smart Contract Similarity Detection

📅 2025-09-11

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Open-source code reuse in smart contracts exacerbates cross-contract vulnerability propagation, yet existing similarity detection methods suffer from insufficient semantic granularity and poor interpretability: abstract syntax tree (AST)-based approaches struggle to model complex structural patterns, while deep learning methods often neglect syntactic constraints and lack transparency. To address these limitations, we propose a statement-level fine-grained similarity detection framework. It decomposes ASTs into sequences of statement trees, constructs a syntax-aware statement-level classifier, and introduces a cosine-diffusion hyperparameter search algorithm that jointly optimizes automation and decision interpretability. Evaluated on three real-world datasets, our method achieves an average F1-score of 95.88%, outperforming the state-of-the-art by 14.01%. It significantly enhances both vulnerability clone identification accuracy and detection robustness.

Technology Category

Application Category

📝 Abstract

Widespread reuse of open-source code in smart contract development boosts programming efficiency but significantly amplifies bug propagation across contracts, while dedicated methods for detecting similar smart contract functions remain very limited. Conventional abstract-syntax-tree (AST) based methods for smart contract similarity detection face challenges in handling intricate tree structures, which impedes detailed semantic comparison of code. Recent deep-learning based approaches tend to overlook code syntax and detection interpretability, resulting in suboptimal performance. To fill this research gap, we introduce SmartDetector, a novel approach for computing similarity between smart contract functions, explainable at the fine-grained statement level. Technically, SmartDetector decomposes the AST of a smart contract function into a series of smaller statement trees, each reflecting a structural element of the source code. Then, SmartDetector uses a classifier to compute the similarity score of two functions by comparing each pair of their statement trees. To address the infinite hyperparameter space of the classifier, we mathematically derive a cosine-wise diffusion process to efficiently search optimal hyperparameters. Extensive experiments conducted on three large real-world datasets demonstrate that SmartDetector outperforms current state-of-the-art methods by an average improvement of 14.01% in F1-score, achieving an overall average F1-score of 95.88%.

Problem

Research questions and friction points this paper is trying to address.

Detecting similar smart contract functions to prevent bug propagation

Overcoming AST-based method limitations in semantic code comparison

Addressing interpretability and syntax neglect in deep learning approaches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes AST into statement trees for comparison

Uses cosine-wise diffusion for hyperparameter optimization

Achieves interpretable similarity detection at statement level

🔎 Similar Papers

SC-Bench: A Large-Scale Dataset for Smart Contract Auditing