Fine-Tuning Pre-Trained Code Models for AI-Generated Code Detection

📅 2026-05-02

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This study addresses the detection of AI-generated code and the attribution of its source model by proposing tailored fine-tuning strategies for two subtasks: binary classification (human- vs. AI-generated) and identification among 11 distinct code generation models. The approach integrates multiple pre-trained models—CodeBERT, GraphCodeBERT, UniXcoder, and CodeT5+—and employs leave-one-language-out cross-validation, sandwich token packing, hard-example threshold calibration, and multi-model ensembling. Additional techniques include code augmentation, chunked inference, class-balanced loss, and test-time augmentation. Evaluated on SemEval-2026 Task 13, the method achieves a macro F1 score of 0.737 in Subtask-A (ranking 6th out of 81 teams) and 0.422 in Subtask-B (7th out of 34 teams), demonstrating both effectiveness and innovation.

📝 Abstract

This paper describes the system submitted by team \textbf{Archaeology} to SemEval-2026 Task~13 on AI-generated code detection. The shared task consists of three subtasks; we participate in Subtask-A (binary classification: human-written vs.\ AI-generated code) and Subtask-B (11-class attribution of the generating model). Starting from a TF-IDF and Logistic Regression baseline, we fine-tune four pre-trained code models (CodeBERT, GraphCodeBERT, UniXcoder, and CodeT5+) with separate strategies for each subtask. For Subtask-A, we use leave-one-language-out cross-validation, code augmentation, chunked inference with trimmed-mean aggregation, and threshold calibration on a difficult dataset. For Subtask-B, we use sandwich token packing, class-balanced loss, and multi-seed ensembling with test-time augmentation. Our best submissions obtain macro-F1 scores of 0.737 on Subtask-A (6th/81 teams) and 0.422 on Subtask-B (7th/34 teams).

Problem

Research questions and friction points this paper is trying to address.

AI-generated code detection

code attribution

binary classification

pre-trained code models

code provenance

Innovation

Methods, ideas, or system contributions that make the work stand out.

code augmentation

leave-one-language-out cross-validation

sandwich token packing