π€ AI Summary
This study investigates the impact of minimal prompt perturbations on the security of code generated by large language models. By introducing single-character perturbations across three prominent models and five programming languages, the authors systematically analyze the mechanisms through which such subtle changes induce security vulnerabilities, leveraging token-level mutations, hidden state probing, and AUC-based evaluation. The work reveals for the first time that even a single-character modification can transform otherwise secure code into vulnerable outputs. Furthermore, input-handling vulnerabilities (AUC = 0.753) are shown to be more predictable than those related to insecure defaults (AUC = 0.674). Building on these findings, the paper proposes differentiated defense strategies, including pre-generation detection and in-decoding intervention, to mitigate the risks posed by prompt-level perturbations.
π Abstract
LLM-based coding assistants are seeing rapid adoption, offering substantial gains in developer productivity. As organizations increasingly ship code these agents produce, the security of that code becomes critical. Prior work has shown that minor prompt perturbations degrade the functional correctness of LLM-generated code, but whether they also compromise code security has remained unstudied. We apply token-level mutations to prompts across three models and five programming languages, and show that mutations as small as a single-character change can flip generated code from secure to vulnerable. Probing the models' hidden states reveals that this fragility is partially encoded in prompt representations, but unevenly so. Input-handling vulnerabilities, where the model omits validation or sanitization, are more predictable (mean AUC 0.753) than secure-defaults vulnerabilities, where insecure code stems from one local choice such as a weak algorithm or unsafe parameter (mean AUC 0.674). These results show that the threat model for LLM-assisted coding extends beyond prompt injection to ordinary prompt variation, and indicate that input-handling flaws can be caught before generation while secure-defaults flaws require intervention during decoding.