Detecting LLM-generated Code with Subtle Modification by Adversarial Training

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Existing detectors struggle to identify LLM-generated code after minor perturbations—such as variable renaming or structural reorganization—undermining reliable provenance attribution. Method: We propose a robust detection framework integrating the MIST adversarial sample generation module, which combines adversarial training, multi-objective classification, and syntax-aware code transformation techniques into the CodeGPTSensor architecture to enhance generalization against diverse semantic-preserving modifications. Contribution/Results: Evaluated on the HMCorp dataset, our method maintains high accuracy on the original test set while significantly improving detection accuracy on adversarial test sets—demonstrating strong robustness to fine-grained code perturbations. This work establishes a deployable, tamper-resistant paradigm for compliance auditing of LLM-generated code, advancing practical code provenance verification in real-world software development and security contexts.

Technology Category

Application Category

📝 Abstract

With the rapid development of Large Language Models (LLMs), their powerful code-generation capabilities have been widely applied in tasks like code completion and automated development, demonstrating the value of improving coding efficiency. However, the extensive use of LLM-generated code also raises several new challenges. On the one hand, issues such as the regulation of code provenance, copyright disputes, and code quality have become increasingly concerning. How to effectively detect LLM-generated code and ensure its compliant and responsible use has become a critical and urgent issue. On the other hand, in practical applications, LLM-generated code is often subject to manual modifications, such as variable renaming or structural adjustments. Although some recent studies have proposed training-based and zero-shot methods for detecting LLM-generated code, these approaches show insufficient robustness when facing modified LLM-generated code, and there is a lack of an effective solution. To address the real-world scenario where LLM-generated code may undergo minor modifications, we propose CodeGPTSensor+, an enhanced version of CodeGPTSensor, which employs adversarial training to improve robustness against input perturbations. CodeGPTSensor+ integrates an adversarial sample generation module, Multi-objective Identifier and Structure Transformation (MIST), which systematically generates both high-quality and representative adversarial samples. This module effectively enhances the model's resistance against diverse adversarial attacks. Experimental results on the HMCorp dataset demonstrate that CodeGPTSensor+ significantly improves detection accuracy on the adversarial test set while maintaining high accuracy on the original test set, showcasing superior robustness compared to CodeGPTSensor.

Problem

Research questions and friction points this paper is trying to address.

Detect LLM-generated code with subtle adversarial modifications

Address robustness gaps in existing LLM code detection methods

Resolve copyright and quality concerns in LLM-generated code usage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial training enhances detection robustness

MIST generates diverse adversarial samples

Improved accuracy on modified LLM-generated code

🔎 Similar Papers

Is The Watermarking Of LLM-Generated Code Robust?