MultiAIGCD: A Comprehensive dataset for AI Generated Code Detection Covering Multiple Languages, Models,Prompts, and Scenarios

📅 2025-07-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The proliferation of AI-generated code threatens academic integrity and hiring fairness, yet existing detection benchmarks lack cross-language coverage, model diversity, and real-world scenario relevance. Method: We introduce MultiCodeDetect—the first comprehensive benchmark for detecting AI-generated code, covering Python, Java, and Go; six large language models (LLMs); multiple prompting strategies; and three authentic settings: education, recruitment, and open-source development. Built upon CodeNet tasks and high-quality human-written reference solutions, it comprises 121,271 AI-generated and 32,148 human-written code samples, meticulously paired across languages, models, and prompts. Contribution/Results: The dataset is fully open-sourced. For the first time, we systematically evaluate mainstream detectors on cross-model generalization and cross-lingual transferability, significantly enhancing the practicality, robustness, and ecological validity of code origin detection benchmarks.

Technology Category

Application Category

📝 Abstract
As large language models (LLMs) rapidly advance, their role in code generation has expanded significantly. While this offers streamlined development, it also creates concerns in areas like education and job interviews. Consequently, developing robust systems to detect AI-generated code is imperative to maintain academic integrity and ensure fairness in hiring processes. In this study, we introduce MultiAIGCD, a dataset for AI-generated code detection for Python, Java, and Go. From the CodeNet dataset's problem definitions and human-authored codes, we generate several code samples in Java, Python, and Go with six different LLMs and three different prompts. This generation process covered three key usage scenarios: (i) generating code from problem descriptions, (ii) fixing runtime errors in human-written code, and (iii) correcting incorrect outputs. Overall, MultiAIGCD consists of 121,271 AI-generated and 32,148 human-written code snippets. We also benchmark three state-of-the-art AI-generated code detection models and assess their performance in various test scenarios such as cross-model and cross-language. We share our dataset and codes to support research in this field.
Problem

Research questions and friction points this paper is trying to address.

Detect AI-generated code across multiple languages and models
Address concerns in education and hiring integrity
Evaluate detection models in cross-language and cross-model scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-language dataset for AI code detection
Generated with multiple LLMs and prompts
Benchmarked cross-model and cross-language performance
🔎 Similar Papers
No similar papers found.
B
Basak Demirok
Department of Computer Engineering, TOBB University of Economics and Technology, Ankara, Türkiye
Mucahid Kutlu
Mucahid Kutlu
Assistant Professor, Qatar University
Information RetrievalNatural Language Processing
S
Selin Mergen
Department of Computer Engineering, TOBB University of Economics and Technology, Ankara, Türkiye