Semantics-Preserving Evasion of LLM Vulnerability Detectors

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work addresses the insufficient robustness of large language model (LLM)-based vulnerability detectors under semantic-preserving code perturbations, which renders them susceptible to evasion. We present the first systematic investigation revealing the widespread failure of state-of-the-art LLM detectors on a unified C/C++ benchmark when subjected to semantically equivalent transformations. To quantify this vulnerability, we introduce a carrier-based joint robustness metric and develop a comprehensive evaluation framework encompassing diverse semantic-preserving transformations, adversarial example generation, and both black-box and white-box attacks. Experimental results demonstrate that even high-performing detectors, while accurate on clean samples, suffer significant performance degradation under perturbations. Moreover, universal adversarial strings exhibit strong transferability to black-box APIs, and leveraging gradient information further enhances evasion success rates.

Technology Category

Application Category

📝 Abstract

LLM-based vulnerability detectors are increasingly deployed in security-critical code review, yet their resilience to evasion under behavior-preserving edits remains poorly understood. We evaluate detection-time integrity under a semantics-preserving threat model by instantiating diverse behavior-preserving code transformations on a unified C/C++ benchmark (N=5000), and introduce a metric of joint robustness across different attack methods/carriers. Across models, we observe a systemic failure of semantic invariant adversarial transformations: even state-of-the-art vulnerability detectors perform well on clean inputs while predictions flip under behavior-equivalent edits. Universal adversarial strings optimized on a single surrogate model remain effective when transferred to black-box APIs, and gradient access can further amplify evasion success. These results show that even high-performing detectors are vulnerable to low-cost, semantics-preserving evasion. Our carrier-based metrics provide practical diagnostics for evaluating LLM-based code detectors.

Problem

Research questions and friction points this paper is trying to address.

LLM vulnerability detection

semantics-preserving evasion

adversarial code transformation

behavior-equivalent edits

detection robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

semantics-preserving evasion

LLM vulnerability detection

adversarial code transformation