🤖 AI Summary
This work addresses the performance degradation in smart contract vulnerability detection caused by label noise introduced through reliance on unreliable open-source annotation tools. To mitigate this issue, the authors propose CGBC, a novel approach that integrates granular-ball computing between the encoder and classifier to generate coarse-grained representations via clustering, thereby correcting noisy labels. The method further enhances robustness by combining unsupervised contrastive pretraining, semantics-preserving data augmentation, and symmetric cross-entropy loss. Notably, this is the first study to synergize granular-ball computing with contrastive learning for vulnerability detection, introducing tailored intra- and inter-granular-ball loss functions. Extensive experiments demonstrate that CGBC significantly outperforms state-of-the-art methods across multiple benchmarks, effectively alleviating the adverse impact of label noise and substantially improving both detection accuracy and model robustness.
📝 Abstract
Deep neural networks (DNNs) have emerged as a prominent approach for detecting smart contract vulnerabilities, driven by the growing contract datasets and advanced deep learning techniques. However, DNNs typically require large-scale labeled datasets to model the relationships between contract features and vulnerability labels. In practice, the labeling process often depends on existing open-sourced tools, whose accuracy cannot be guaranteed. Consequently, label noise poses a significant challenge for the accuracy and robustness of the smart contract, which is rarely explored in the literature. To this end, we propose Contrastive learning-enhanced Granular-Ball smart Contracts training, CGBC, to enhance the robustness of contract vulnerability detection. Specifically, CGBC first introduces a Granular-ball computing layer between the encoder layer and the classifier layer, to group similar contracts into Granular-Balls (GBs) and generate new coarse-grained representations (i.e., the center and the label of GBs) for them, which can correct noisy labels based on the most correct samples. An inter-GB compactness loss and an intra-GB looseness loss are combined to enhance the effectiveness of clustering. Then, to improve the accuracy of GBs, we pretrain the model through unsupervised contrastive learning supported by our novel semantic-consistent smart contract augmentation method. This procedure can discriminate contracts with different labels by dragging the representation of similar contracts closer, assisting CGBC in clustering. Subsequently, we leverage the symmetric cross-entropy loss function to measure the model quality, which can combat the label noise in gradient computations. Finally, extensive experiments show that the proposed CGBC can significantly improve the robustness and effectiveness of the smart contract vulnerability detection when contrasted with baselines.