🤖 AI Summary
Open-source code reuse in smart contracts exacerbates cross-contract vulnerability propagation, yet existing similarity detection methods suffer from insufficient semantic granularity and poor interpretability: abstract syntax tree (AST)-based approaches struggle to model complex structural patterns, while deep learning methods often neglect syntactic constraints and lack transparency. To address these limitations, we propose a statement-level fine-grained similarity detection framework. It decomposes ASTs into sequences of statement trees, constructs a syntax-aware statement-level classifier, and introduces a cosine-diffusion hyperparameter search algorithm that jointly optimizes automation and decision interpretability. Evaluated on three real-world datasets, our method achieves an average F1-score of 95.88%, outperforming the state-of-the-art by 14.01%. It significantly enhances both vulnerability clone identification accuracy and detection robustness.
📝 Abstract
Widespread reuse of open-source code in smart contract development boosts programming efficiency but significantly amplifies bug propagation across contracts, while dedicated methods for detecting similar smart contract functions remain very limited. Conventional abstract-syntax-tree (AST) based methods for smart contract similarity detection face challenges in handling intricate tree structures, which impedes detailed semantic comparison of code. Recent deep-learning based approaches tend to overlook code syntax and detection interpretability, resulting in suboptimal performance.
To fill this research gap, we introduce SmartDetector, a novel approach for computing similarity between smart contract functions, explainable at the fine-grained statement level. Technically, SmartDetector decomposes the AST of a smart contract function into a series of smaller statement trees, each reflecting a structural element of the source code. Then, SmartDetector uses a classifier to compute the similarity score of two functions by comparing each pair of their statement trees. To address the infinite hyperparameter space of the classifier, we mathematically derive a cosine-wise diffusion process to efficiently search optimal hyperparameters. Extensive experiments conducted on three large real-world datasets demonstrate that SmartDetector outperforms current state-of-the-art methods by an average improvement of 14.01% in F1-score, achieving an overall average F1-score of 95.88%.