A Systematic Evaluation of Parameter-Efficient Fine-Tuning Methods for the Security of Code LLMs

📅 2025-09-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models for code generation frequently produce insecure code, posing significant risks to software development safety. This paper systematically evaluates seven parameter-efficient fine-tuning (PEFT) methods to enhance the security-aware code generation capability of large models, focusing on Python and Java. We prioritize functional correctness while minimizing vulnerability rates. We propose *Discovery Prompt Tuning*, a novel prompt optimization strategy that—uniquely—reveals the critical role of decoding temperature in governing output security. Furthermore, we integrate temperature-scaled sampling with the TrojanPuzzle framework to rigorously assess adversarial robustness. Evaluated on CodeGen2-16B, our approach achieves an overall safety rate of 87.65%, improving upon the baseline by 13.5 percentage points (to 80.86%). This reduction corresponds to approximately 203,700 fewer vulnerable code snippets per million generated—demonstrating a scalable, robust, and lightweight optimization paradigm for secure code generation.

Technology Category

Application Category

📝 Abstract
Code-generating Large Language Models (LLMs) significantly accelerate software development. However, their frequent generation of insecure code presents serious risks. We present a comprehensive evaluation of seven parameter-efficient fine-tuning (PEFT) techniques, demonstrating substantial gains in secure code generation without compromising functionality. Our research identifies prompt-tuning as the most effective PEFT method, achieving an 80.86% Overall-Secure-Rate on CodeGen2 16B, a 13.5-point improvement over the 67.28% baseline. Optimizing decoding strategies through sampling temperature further elevated security to 87.65%. This equates to a reduction of approximately 203,700 vulnerable code snippets per million generated. Moreover, prompt and prefix tuning increase robustness against poisoning attacks in our TrojanPuzzle evaluation, with strong performance against CWE-79 and CWE-502 attack vectors. Our findings generalize across Python and Java, confirming prompt-tuning's consistent effectiveness. This study provides essential insights and practical guidance for building more resilient software systems with LLMs.
Problem

Research questions and friction points this paper is trying to address.

Evaluating parameter-efficient fine-tuning for secure code generation
Identifying most effective methods to reduce vulnerable code snippets
Assessing robustness against poisoning attacks in code LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated seven parameter-efficient fine-tuning techniques
Identified prompt-tuning as most effective method
Optimized decoding strategies with sampling temperature