🤖 AI Summary
Neural code models are vulnerable to adversarial attacks, yet existing defenses are often computationally expensive, lack theoretical guarantees, and require white-box access. This work proposes ENBECOME—the first lightweight, training-free defense framework applicable in black-box settings—which achieves randomized smoothing of decision boundaries by introducing semantic-preserving random perturbations to input code during inference. The method delivers both empirical and certifiably robust guarantees: on a defect detection task, it reduces attack success rates from 42.43% to 9.74% with only a 0.29% drop in accuracy and attains an average certified robustness radius of 1.63. ENBECOME thus establishes the first formal robustness guarantee for neural code models under training-agnostic, black-box conditions.
📝 Abstract
With the development of deep learning, Neural Code Models (NCMs) such as CodeBERT and CodeLlama are widely used for code understanding tasks, including defect detection and code classification. However, recent studies have revealed that NCMs are vulnerable to adversarial examples, inputs with subtle perturbations that induce incorrect predictions while remaining difficult to detect. Existing defenses address this issue via data augmentation to empirically improve robustness, but they are costly, offer no theoretical robustness guarantees, and typically require white-box access to model internals, such as gradients. To address the above challenges, we propose ENBECOME, a novel black-box training-free and lightweight adversarial defense. ENBECOME is designed to both enhance empirical robustness and report certified robustness boundaries for NCMs. ENBECOME operates solely during inference, introducing random, semantics-preserving perturbations to input code snippets to smooth the NCM's decision boundaries. This smoothing enables ENBECOME to formally certify a robustness radius within which adversarial examples can never induce misclassification, a property known as certified robustness. We conduct comprehensive experiments across multiple NCM architectures and tasks. Results show that ENBECOME significantly reduces attack success rates while maintaining high accuracy. For example, in defect detection, it reduces the average ASR from 42.43% to 9.74% with only a 0.29% drop in accuracy. Results show that ENBECOME significantly reduces attack success rates while maintaining high accuracy. For example, in defect detection, it reduces the average ASR from 42.43% to 9.74% with only a 0.29% drop in accuracy. Furthermore, ENBECOME achieves an average certified robustness radius of 1.63, meaning that adversarial modifications to no more than 1.63 identifiers are provably ineffective.